{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h1> Structured data prediction using BigQuery ML </h1>\n",
    "\n",
    "This notebook illustrates:\n",
    "<ol>\n",
    "<li> Training Machine Learning models using BQML\n",
    "<li> Predicting with model\n",
    "<li> Using spatial queries in BigQuery\n",
    "<li> Building a linear regression model with feature crosses\n",
    "</ol>\n",
    "\n",
    "The goal is to predict taxifare given the starting and ending points.\n",
    "\n",
    "\n",
    "## Set up notebook environment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install google-cloud-bigquery seaborn"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Restart the kernel so that the new packages are picked up."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# change these to try this notebook out\n",
    "PROJECT = 'cloud-training-demos'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "os.environ['PROJECT'] = PROJECT"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Updated property [core/project].\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "gcloud config set project $PROJECT"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create BigQuery output dataset\n",
    "\n",
    "If necessary, create a BigQuery dataset to store the trained model and artifacts of training.\n",
    "(you can also do this from the GCP web console)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "#bq mk demos"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "from google.cloud import bigquery\n",
    "bq = bigquery.Client(project=PROJECT)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create BigQuery training dataset\n",
    "\n",
    "Please see [this notebook](https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/courses/machine_learning/datasets/create_datasets.ipynb) for more context on this problem and how the features were chosen."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "import seaborn as sns\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import shutil"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [],
   "source": [
    "def create_input_dataset(split, sample=1000):\n",
    "  \"\"\"\n",
    "  split is TRAIN or EVAL\n",
    "  sample=1000 pulls 1/1000 of full dataset\n",
    "  \"\"\"\n",
    "  \n",
    "  query=\"\"\"\n",
    "  WITH params AS (\n",
    "    SELECT\n",
    "    1 AS TRAIN,\n",
    "    2 AS EVAL\n",
    "    ),\n",
    "\n",
    "  daynames AS\n",
    "    (SELECT ['Sun', 'Mon', 'Tues', 'Wed', 'Thurs', 'Fri', 'Sat'] AS daysofweek),\n",
    "\n",
    "  taxitrips AS (\n",
    "  SELECT\n",
    "    (tolls_amount + fare_amount) AS total_fare,\n",
    "    daysofweek[ORDINAL(EXTRACT(DAYOFWEEK FROM pickup_datetime))] AS dayofweek,\n",
    "    EXTRACT(HOUR FROM pickup_datetime) AS hourofday,\n",
    "    pickup_longitude AS pickuplon,\n",
    "    pickup_latitude AS pickuplat,\n",
    "    dropoff_longitude AS dropofflon,\n",
    "    dropoff_latitude AS dropofflat,\n",
    "    passenger_count AS passengers\n",
    "  FROM\n",
    "    `nyc-tlc.yellow.trips`, daynames, params\n",
    "  WHERE\n",
    "    trip_distance > 0 AND fare_amount > 0 AND fare_amount < 100\n",
    "    AND ABS(MOD(FARM_FINGERPRINT(CAST(pickup_datetime AS STRING)), {0})) = params.{1}\n",
    "  )\n",
    "\n",
    "  SELECT *\n",
    "  FROM taxitrips\n",
    "  \"\"\".format(sample, split)\n",
    "  \n",
    "  return query"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note a few things about the query:\n",
    "* The main part of the query is at the bottom: (SELECT * from taxitrips).\n",
    "* taxitrips does the bulk of the extraction for the NYC dataset, with the SELECT containing my training features and label.\n",
    "* The WHERE removes data that I don't want to train on.\n",
    "* The WHERE also includes a sampling clause to pick up only 1/1000th of the data\n",
    "* I define a variable called TRAIN so that I can quickly build an independent EVAL set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "  WITH params AS (\n",
      "    SELECT\n",
      "    1 AS TRAIN,\n",
      "    2 AS EVAL\n",
      "    ),\n",
      "\n",
      "  daynames AS\n",
      "    (SELECT ['Sun', 'Mon', 'Tues', 'Wed', 'Thurs', 'Fri', 'Sat'] AS daysofweek),\n",
      "\n",
      "  taxitrips AS (\n",
      "  SELECT\n",
      "    (tolls_amount + fare_amount) AS total_fare,\n",
      "    daysofweek[ORDINAL(EXTRACT(DAYOFWEEK FROM pickup_datetime))] AS dayofweek,\n",
      "    EXTRACT(HOUR FROM pickup_datetime) AS hourofday,\n",
      "    pickup_longitude AS pickuplon,\n",
      "    pickup_latitude AS pickuplat,\n",
      "    dropoff_longitude AS dropofflon,\n",
      "    dropoff_latitude AS dropofflat,\n",
      "    passenger_count AS passengers\n",
      "  FROM\n",
      "    `nyc-tlc.yellow.trips`, daynames, params\n",
      "  WHERE\n",
      "    trip_distance > 0 AND fare_amount > 0\n",
      "    AND ABS(MOD(FARM_FINGERPRINT(CAST(pickup_datetime AS STRING)), 1000)) = params.TRAIN\n",
      "  )\n",
      "\n",
      "  SELECT *\n",
      "  FROM taxitrips\n",
      "  \n"
     ]
    }
   ],
   "source": [
    "query = create_input_dataset('TRAIN')\n",
    "print(query)\n",
    "trips = bq.query(query + \" LIMIT 1000\", project=PROJECT).to_dataframe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>total_fare</th>\n",
       "      <th>dayofweek</th>\n",
       "      <th>hourofday</th>\n",
       "      <th>pickuplon</th>\n",
       "      <th>pickuplat</th>\n",
       "      <th>dropofflon</th>\n",
       "      <th>dropofflat</th>\n",
       "      <th>passengers</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2.5</td>\n",
       "      <td>Sun</td>\n",
       "      <td>12</td>\n",
       "      <td>-73.982177</td>\n",
       "      <td>40.770006</td>\n",
       "      <td>-73.981291</td>\n",
       "      <td>40.769298</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2.5</td>\n",
       "      <td>Mon</td>\n",
       "      <td>15</td>\n",
       "      <td>-73.986812</td>\n",
       "      <td>40.759453</td>\n",
       "      <td>-73.986258</td>\n",
       "      <td>40.759607</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3.0</td>\n",
       "      <td>Tues</td>\n",
       "      <td>6</td>\n",
       "      <td>-73.975887</td>\n",
       "      <td>40.757060</td>\n",
       "      <td>-73.973400</td>\n",
       "      <td>40.760550</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3.5</td>\n",
       "      <td>Thurs</td>\n",
       "      <td>21</td>\n",
       "      <td>-73.954017</td>\n",
       "      <td>40.770487</td>\n",
       "      <td>-73.949420</td>\n",
       "      <td>40.772747</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>3.5</td>\n",
       "      <td>Sun</td>\n",
       "      <td>1</td>\n",
       "      <td>-73.989583</td>\n",
       "      <td>40.691210</td>\n",
       "      <td>-73.987220</td>\n",
       "      <td>40.692317</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   total_fare dayofweek  hourofday  pickuplon  pickuplat  dropofflon  \\\n",
       "0         2.5       Sun         12 -73.982177  40.770006  -73.981291   \n",
       "1         2.5       Mon         15 -73.986812  40.759453  -73.986258   \n",
       "2         3.0      Tues          6 -73.975887  40.757060  -73.973400   \n",
       "3         3.5     Thurs         21 -73.954017  40.770487  -73.949420   \n",
       "4         3.5       Sun          1 -73.989583  40.691210  -73.987220   \n",
       "\n",
       "   dropofflat  passengers  \n",
       "0   40.769298           1  \n",
       "1   40.759607           2  \n",
       "2   40.760550           5  \n",
       "3   40.772747           1  \n",
       "4   40.692317           1  "
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "trips.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>total_fare</th>\n",
       "      <th>hourofday</th>\n",
       "      <th>pickuplon</th>\n",
       "      <th>pickuplat</th>\n",
       "      <th>dropofflon</th>\n",
       "      <th>dropofflat</th>\n",
       "      <th>passengers</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>1000.000000</td>\n",
       "      <td>1000.000000</td>\n",
       "      <td>1000.000000</td>\n",
       "      <td>1000.000000</td>\n",
       "      <td>1000.000000</td>\n",
       "      <td>1000.000000</td>\n",
       "      <td>1000.00000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>11.193580</td>\n",
       "      <td>13.615000</td>\n",
       "      <td>-72.422339</td>\n",
       "      <td>39.936178</td>\n",
       "      <td>-72.716949</td>\n",
       "      <td>40.059515</td>\n",
       "      <td>1.79800</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>9.624011</td>\n",
       "      <td>6.523598</td>\n",
       "      <td>10.612316</td>\n",
       "      <td>5.708082</td>\n",
       "      <td>9.567607</td>\n",
       "      <td>5.270821</td>\n",
       "      <td>1.38965</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>2.500000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>-74.044723</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>-74.177602</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.00000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>6.000000</td>\n",
       "      <td>9.000000</td>\n",
       "      <td>-73.991939</td>\n",
       "      <td>40.735623</td>\n",
       "      <td>-73.991719</td>\n",
       "      <td>40.734412</td>\n",
       "      <td>1.00000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>8.000000</td>\n",
       "      <td>14.000000</td>\n",
       "      <td>-73.980697</td>\n",
       "      <td>40.752287</td>\n",
       "      <td>-73.980773</td>\n",
       "      <td>40.753813</td>\n",
       "      <td>1.00000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>12.500000</td>\n",
       "      <td>19.000000</td>\n",
       "      <td>-73.967146</td>\n",
       "      <td>40.767698</td>\n",
       "      <td>-73.962595</td>\n",
       "      <td>40.768420</td>\n",
       "      <td>2.00000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>69.650000</td>\n",
       "      <td>23.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>40.835693</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>40.908057</td>\n",
       "      <td>6.00000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        total_fare    hourofday    pickuplon    pickuplat   dropofflon  \\\n",
       "count  1000.000000  1000.000000  1000.000000  1000.000000  1000.000000   \n",
       "mean     11.193580    13.615000   -72.422339    39.936178   -72.716949   \n",
       "std       9.624011     6.523598    10.612316     5.708082     9.567607   \n",
       "min       2.500000     0.000000   -74.044723     0.000000   -74.177602   \n",
       "25%       6.000000     9.000000   -73.991939    40.735623   -73.991719   \n",
       "50%       8.000000    14.000000   -73.980697    40.752287   -73.980773   \n",
       "75%      12.500000    19.000000   -73.967146    40.767698   -73.962595   \n",
       "max      69.650000    23.000000     0.000000    40.835693     0.000000   \n",
       "\n",
       "        dropofflat  passengers  \n",
       "count  1000.000000  1000.00000  \n",
       "mean     40.059515     1.79800  \n",
       "std       5.270821     1.38965  \n",
       "min       0.000000     0.00000  \n",
       "25%      40.734412     1.00000  \n",
       "50%      40.753813     1.00000  \n",
       "75%      40.768420     2.00000  \n",
       "max      40.908057     6.00000  "
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "trips.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create ML model training query\n",
    "\n",
    "This is the query to train the model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [],
   "source": [
    "def create_train_query(dataset_query, model_name):\n",
    "  query=\"\"\"\n",
    "CREATE or REPLACE MODEL {0}\n",
    "OPTIONS\n",
    "  (model_type='linear_reg', labels=['total_fare'], min_rel_progress=0.005, l2_reg=0.1) AS\n",
    "  \n",
    "  {1}\n",
    "  \n",
    "  \"\"\".format(model_name, dataset_query)\n",
    "  return query"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "CREATE or REPLACE MODEL demos.taxifare_model\n",
      "OPTIONS\n",
      "  (model_type='linear_reg', labels=['total_fare'], min_rel_progress=0.005, ls_init_learn_rate=0.1) AS\n",
      "  \n",
      "  \n",
      "  WITH params AS (\n",
      "    SELECT\n",
      "    1 AS TRAIN,\n",
      "    2 AS EVAL\n",
      "    ),\n",
      "\n",
      "  daynames AS\n",
      "    (SELECT ['Sun', 'Mon', 'Tues', 'Wed', 'Thurs', 'Fri', 'Sat'] AS daysofweek),\n",
      "\n",
      "  taxitrips AS (\n",
      "  SELECT\n",
      "    (tolls_amount + fare_amount) AS total_fare,\n",
      "    daysofweek[ORDINAL(EXTRACT(DAYOFWEEK FROM pickup_datetime))] AS dayofweek,\n",
      "    EXTRACT(HOUR FROM pickup_datetime) AS hourofday,\n",
      "    pickup_longitude AS pickuplon,\n",
      "    pickup_latitude AS pickuplat,\n",
      "    dropoff_longitude AS dropofflon,\n",
      "    dropoff_latitude AS dropofflat,\n",
      "    passenger_count AS passengers\n",
      "  FROM\n",
      "    `nyc-tlc.yellow.trips`, daynames, params\n",
      "  WHERE\n",
      "    trip_distance > 0 AND fare_amount > 0\n",
      "    AND ABS(MOD(FARM_FINGERPRINT(CAST(pickup_datetime AS STRING)), 1000)) = params.TRAIN\n",
      "  )\n",
      "\n",
      "  SELECT *\n",
      "  FROM taxitrips\n",
      "  \n",
      "  \n",
      "  \n"
     ]
    }
   ],
   "source": [
    "train_query = create_train_query( create_input_dataset('TRAIN'), 'demos.taxifare_model' )\n",
    "print(train_query)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note a few things about the above query:\n",
    "* CREATE model is a safe way to ensure that you don't overwrite existing models. CREATE or REPLACE will … replace existing models.\n",
    "* I specify my model type. Use linear_reg for regression problems and logistic_reg for classification problems.\n",
    "* I specify that the total_fare column is the label.\n",
    "* I ask that model training stop when the improvement is < 0.5%\n",
    "* I specify the initial learning rate at 0.1 (this is optional, but shows you how to specify any optional parameters)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Train the ML model\n",
    "\n",
    "This will take <b>5-10 min</b>.  Wait for a message of the form \"Job xyz completed\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<google.cloud.bigquery.table._EmptyRowIterator at 0x7fa39f506630>"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bq.query(train_query, project=PROJECT).result()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once the above job is complete, you can look at the training loss:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "SELECT iteration, loss from ML.TRAINING_INFO(MODEL demos.taxifare_model)\n",
      "   iteration      loss\n",
      "0          0  9.463086\n"
     ]
    }
   ],
   "source": [
    "def show_training_loss(model_name):\n",
    "  query = \"SELECT iteration, loss from ML.TRAINING_INFO(MODEL {})\".format(model_name)\n",
    "  print(query)\n",
    "  loss_df = bq.query(query, project=PROJECT).to_dataframe()\n",
    "  loss_df['loss'] = np.sqrt(loss_df['loss'])  # mean square error to RMSE\n",
    "\n",
    "  if len(loss_df) > 1:\n",
    "    # Sometimes, BigQuery can compute a closed form solution.\n",
    "    # See: https://medium.com/google-cloud/bigquery-ml-gets-faster-by-computing-a-closed-form-solution-sometimes-1baa5a838eb6\n",
    "    loss_df.plot(x='iteration', y='loss');\n",
    "  else:\n",
    "    print(loss_df)\n",
    "show_training_loss('demos.taxifare_model');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Evaluate the model on an independent dataset\n",
    "\n",
    "Let's look at overall RMSE (notice the use of ML.EVALUATE)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "SELECT\n",
      "  *,\n",
      "  SQRT( mean_squared_error ) AS rmse\n",
      "FROM\n",
      "  ML.EVALUATE(MODEL demos.taxifare_model,\n",
      "  (\n",
      "  \n",
      "  WITH params AS (\n",
      "    SELECT\n",
      "    1 AS TRAIN,\n",
      "    2 AS EVAL\n",
      "    ),\n",
      "\n",
      "  daynames AS\n",
      "    (SELECT ['Sun', 'Mon', 'Tues', 'Wed', 'Thurs', 'Fri', 'Sat'] AS daysofweek),\n",
      "\n",
      "  taxitrips AS (\n",
      "  SELECT\n",
      "    (tolls_amount + fare_amount) AS total_fare,\n",
      "    daysofweek[ORDINAL(EXTRACT(DAYOFWEEK FROM pickup_datetime))] AS dayofweek,\n",
      "    EXTRACT(HOUR FROM pickup_datetime) AS hourofday,\n",
      "    pickup_longitude AS pickuplon,\n",
      "    pickup_latitude AS pickuplat,\n",
      "    dropoff_longitude AS dropofflon,\n",
      "    dropoff_latitude AS dropofflat,\n",
      "    passenger_count AS passengers\n",
      "  FROM\n",
      "    `nyc-tlc.yellow.trips`, daynames, params\n",
      "  WHERE\n",
      "    trip_distance > 0 AND fare_amount > 0\n",
      "    AND ABS(MOD(FARM_FINGERPRINT(CAST(pickup_datetime AS STRING)), 1000)) = params.EVAL\n",
      "  )\n",
      "\n",
      "  SELECT *\n",
      "  FROM taxitrips\n",
      "  \n",
      "  ))  \n",
      "  \n"
     ]
    }
   ],
   "source": [
    "def create_eval_query(dataset_query, model_name):\n",
    "  query=\"\"\"\n",
    "SELECT\n",
    "  *,\n",
    "  SQRT( mean_squared_error ) AS rmse\n",
    "FROM\n",
    "  ML.EVALUATE(MODEL {0},\n",
    "  (\n",
    "  {1}\n",
    "  ))  \n",
    "  \"\"\".format(model_name, dataset_query)\n",
    "  return query\n",
    "eval_query = create_eval_query( create_input_dataset('EVAL'), 'demos.taxifare_model' )\n",
    "print(eval_query)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mean_absolute_error</th>\n",
       "      <th>mean_squared_error</th>\n",
       "      <th>mean_squared_log_error</th>\n",
       "      <th>median_absolute_error</th>\n",
       "      <th>r2_score</th>\n",
       "      <th>explained_variance</th>\n",
       "      <th>rmse</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>5.927267</td>\n",
       "      <td>89.806649</td>\n",
       "      <td>0.323691</td>\n",
       "      <td>4.501331</td>\n",
       "      <td>0.000915</td>\n",
       "      <td>0.000917</td>\n",
       "      <td>9.476637</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   mean_absolute_error  mean_squared_error  mean_squared_log_error  \\\n",
       "0             5.927267           89.806649                0.323691   \n",
       "\n",
       "   median_absolute_error  r2_score  explained_variance      rmse  \n",
       "0               4.501331  0.000915            0.000917  9.476637  "
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "eval_df = bq.query(eval_query, project=PROJECT).to_dataframe()\n",
    "eval_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can write a more sophisticated evaluation that computes the mean absolute percent error (MAPE) and group it by the taxifare to see how the errors vary with amount (notice the use of ML.PREDICT):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "WITH predictions AS (\n",
      "  SELECT\n",
      "    total_fare,\n",
      "    ABS(total_fare - predicted_total_fare)/total_fare AS error,\n",
      "    ROUND(total_fare) AS dollars\n",
      "  FROM\n",
      "  ML.PREDICT(MODEL demos.taxifare_model,\n",
      "  (\n",
      "  \n",
      "  WITH params AS (\n",
      "    SELECT\n",
      "    1 AS TRAIN,\n",
      "    2 AS EVAL\n",
      "    ),\n",
      "\n",
      "  daynames AS\n",
      "    (SELECT ['Sun', 'Mon', 'Tues', 'Wed', 'Thurs', 'Fri', 'Sat'] AS daysofweek),\n",
      "\n",
      "  taxitrips AS (\n",
      "  SELECT\n",
      "    (tolls_amount + fare_amount) AS total_fare,\n",
      "    daysofweek[ORDINAL(EXTRACT(DAYOFWEEK FROM pickup_datetime))] AS dayofweek,\n",
      "    EXTRACT(HOUR FROM pickup_datetime) AS hourofday,\n",
      "    pickup_longitude AS pickuplon,\n",
      "    pickup_latitude AS pickuplat,\n",
      "    dropoff_longitude AS dropofflon,\n",
      "    dropoff_latitude AS dropofflat,\n",
      "    passenger_count AS passengers\n",
      "  FROM\n",
      "    `nyc-tlc.yellow.trips`, daynames, params\n",
      "  WHERE\n",
      "    trip_distance > 0 AND fare_amount > 0 AND fare_amount < 100\n",
      "    AND ABS(MOD(FARM_FINGERPRINT(CAST(pickup_datetime AS STRING)), 1000)) = params.EVAL\n",
      "  )\n",
      "\n",
      "  SELECT *\n",
      "  FROM taxitrips\n",
      "  \n",
      "  )))\n",
      "\n",
      "SELECT\n",
      "  dollars,\n",
      "  -- mean absolute percent error\n",
      "  AVG(100 * error) AS MAPE\n",
      "FROM predictions\n",
      "GROUP BY dollars\n",
      "ORDER BY\n",
      "  dollars\n",
      "  \n"
     ]
    }
   ],
   "source": [
    "def create_faceted_eval_query(dataset_query, model_name):\n",
    "  query=\"\"\"\n",
    "WITH predictions AS (\n",
    "  SELECT\n",
    "    total_fare,\n",
    "    ABS(total_fare - predicted_total_fare)/total_fare AS error,\n",
    "    ROUND(total_fare) AS dollars\n",
    "  FROM\n",
    "  ML.PREDICT(MODEL {0},\n",
    "  (\n",
    "  {1}\n",
    "  )))\n",
    "\n",
    "SELECT\n",
    "  dollars,\n",
    "  -- mean absolute percent error\n",
    "  AVG(100 * error) AS MAPE\n",
    "FROM predictions\n",
    "GROUP BY dollars\n",
    "ORDER BY\n",
    "  dollars\n",
    "  \"\"\".format(model_name, dataset_query)\n",
    "  return query\n",
    "eval_query = create_faceted_eval_query( create_input_dataset('EVAL'), 'demos.taxifare_model')\n",
    "print(eval_query)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0, 100)"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEKCAYAAAAfGVI8AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAIABJREFUeJzt3Xl8VPW9//HXJwsJIZCQhQAJS4Cwg4CIyCaKCyIKRatYtW7V1tpqab3VLj+1vbe3y+2tta0brVuvFpeKSuu+oCibQlhkE5IQIAFCCCTsIcv398cMNtCELTNzJjPv5+ORR2bOnDnnM5OTec853+/5HnPOISIi0SvG6wJERMRbCgIRkSinIBARiXIKAhGRKKcgEBGJcgoCEZEod8IgMLMnzWyHma1qMC3NzN41sw3+3+39083M/mBmBWa20syGBbN4ERFpvpPZI3gamHjMtHuB951zecD7/vsAlwB5/p/bgEcDU6aIiATLCYPAOTcP2HXM5CnAM/7bzwBTG0z/q/NZBKSaWadAFSsiIoEXd5rPy3LObfPf3g5k+W9nA1sazFfin7aNY5jZbfj2GmjTps2Zffv2PaUCNpTtIz7W6J7R5hRLFxGJDEuXLt3pnMts7nJONwi+5JxzZnbK41Q452YCMwGGDx/ulixZckrP/8GLK/hofTmf/WQCZnaqqxcRafHMbFMglnO6vYbKjhzy8f/e4Z9eCnRpMF+Of1rADcpux8591ZTtqQ7G4kVEosbpBsEc4Ab/7RuA1xpM/7q/99BIoKrBIaSAGpSTAsDnpVXBWLyISNQ4me6js4CFQB8zKzGzW4BfARea2QbgAv99gDeAIqAA+DPw7aBUDfTr1I4YUxCIiDTXCdsInHPXNPHQhEbmdcAdzS3qZCS1iqNnZjKrFAQiUaGmpoaSkhIOHTrkdSkhl5iYSE5ODvHx8UFZfrMbi700KDuFTwp2el2GiIRASUkJbdu2pXv37lHVQcQ5R0VFBSUlJeTm5gZlHS16iImB2Sns2FvNjj3R9w1BJNocOnSI9PT0qAoBADMjPT09qHtCLToI1GAsEl2iLQSOCPbrbtFB0L9TO0wNxiIizdKig6BNQhw9MtqowVhEQsLMuO666768X1tbS2ZmJpMnTz5qvqlTpzJy5Mijpj3wwANkZ2czZMgQBg4cyJw5c/5t+pGfysrK4L+YBlp0EICvwVh7BCISCm3atGHVqlUcPHgQgHfffZfs7Oyj5qmsrGTp0qVUVVVRVFR01GMzZsxg+fLlvPTSS9x8883U19cfNf3IT2pqamhekF+LD4KB2SmU7almx141GItI8E2aNInXX38dgFmzZnHNNUf3sJ89ezaXXXYZ06dP5/nnn290Gf369SMuLo6dO8Oj12OL7j4Kvj0CgNWle+jQN9HjakQkFH72j9Ws2bonoMvs37kd91824ITzTZ8+nZ///OdMnjyZlStXcvPNN/Pxxx9/+fisWbO47777yMrK4oorruDHP/7xvy1j8eLFxMTEkJnpGy/uwQcf5NlnnwWgffv2zJ07N0Cv6uS0+CAY4A+ClSVVnNe3g8fViEikGzx4MMXFxcyaNYtJkyYd9VhZWRkbNmxgzJgxmBnx8fGsWrWKgQMHAv/6wG/bti0vvPDCl72BZsyYwd133x3y13JEiw+C5IQ4+nZsy+KNFfiuhyMike5kvrkH0+WXX87dd9/Nhx9+SEVFxZfTX3zxRXbv3v3liV979uxh1qxZ/OIXvwC8/8BvSotvIwAY1zuTJcW7OXC41utSRCQK3Hzzzdx///0MGjToqOmzZs3irbfeori4mOLiYpYuXdpkO0E4iYggGJuXweG6ehZvPPZCaiIigZeTk8Odd9551LTi4mI2bdp0VLfR3NxcUlJSWLx48XGX9+CDDx7VfbS4uDgYZTfJfOPEeet0LkzT0KGaOs742Ttce3Y37rusfwArE5FwsXbtWvr16+d1GZ5p7PWb2VLn3PDmLjsi9ggS42MZkZvGxxvKvS5FRKTFiYggABiXl8mGHfvYVnXQ61JERFqUiAmCsb0zAPh4Q3icoCEigRcOh7K9EOzXHTFB0CerLZltExQEIhEqMTGRioqKqAuDI9cjSEwM3gmzLf48giPMjLF5Gcxdt4P6ekdMTHQOVysSqXJycigpKaG8PPraAo9coSxYIiYIwNdOMDu/lNVb93x5rQIRiQzx8fFBu0JXtIuYQ0MAo3v52gnmqfeQiMhJi6ggyGybQP9O7dSNVETkFERUEICv99DSTbvZX63hJkRETkbEBcG4vExq6px/EDoRETmRiAuCM7u1JzE+hnnr1Y1URORkRFwQJMbHcnZuutoJREROUsQFAfhGIy0s309ppYabEBE5kYgMgnG9fZd/+0R7BSIiJxSRQZDXIZmsdgnM03ATIiInFJFB4BtuIpP5BTupq4+ucUlERE5VRAYB+NoJKg/UsKq0yutSRETCWsQGwZheR4alVjuBiMjxRGwQpCcnMDC7ndoJREROIGKDAGBsXib5m3azT8NNiIg0KcKDIIPaeseiQg03ISLSlIgOgjO7tad1fKzaCUREjqNZQWBmM8xstZmtMrNZZpZoZrlmttjMCszsBTNrFahiT1VCXCwje6Tp8pUiIsdx2kFgZtnAncBw59xAIBaYDvwaeNA51wvYDdwSiEJP19i8TIp27mfLrgNeliEiEraae2goDmhtZnFAErANOB/4u//xZ4CpzVxHs4zr7etG+kmB9gpERBpz2kHgnCsFfgtsxhcAVcBSoNI5d6SbTgmQ3djzzew2M1tiZkuCeTHqnpnJdEpJVDuBiEgTmnNoqD0wBcgFOgNtgIkn+3zn3Ezn3HDn3PDMzMzTLeOEfMNNZPDJBg03ISLSmOYcGroA2OicK3fO1QCzgdFAqv9QEUAOUNrMGpttbF4mew7VsrKk0utSRETCTnOCYDMw0sySzMyACcAaYC5wpX+eG4DXmldi843ulYEZ6j0kItKI5rQRLMbXKJwPfO5f1kzgHuD7ZlYApANPBKDOZklr04pB2SlqJxARaUTciWdpmnPufuD+YyYXASOas9xgGJuXwWMfFbHnUA3tEuO9LkdEJGxE9JnFDY3Ny6Su3rFQw02IiBwlaoJgWNf2JLXScBMiIseKmiBoFRfDOT3S1WAsInKMqAkC8LUTbKo4wKaK/V6XIiISNqIrCHr7TlzTXoGIyL9EVRD0yGhDdmprtROIiDQQVUFwZLiJBQUV1NbVe12OiEhYiKogAF830r3VtazQcBMiIkAUBsHoXunEGMxbr3YCERGIwiBITWrF4JxUtROIiPhFXRAAjMvLYPmWSqoO1nhdioiI56IyCMb2zqTewcJCHR4SEYnKIBjSJZXkhDjm6XwCEZHoDIL42BjO6ZnOvPXlOKerlolIdIvKIABfO0HJ7oNsqjjgdSkiIp6K2iAYm3dkuAn1HhKR6Ba1QdAtPYkuaa3VTiAiUS9qg8A33EQmCwsrqNFwEyISxaI2CMDXTrCvupblWzTchIhEr6gOgnN6ZhBj8PF6tROISPSK6iBIaR3PkC6paicQkagW1UEAvt5DK0sqqTxw2OtSREQ8EfVBMK53BvUOFhRWeF2KiIgnoj4IzshJpW1CnM4nEJGoFfVBEBcbw6he6cxbv1PDTYhIVIr6IABfO0Fp5UGKdu73uhQRkZBTEADj+/iGm3hr1XaPKxERCT0FAZDTPokR3dN4Ob9Eh4dEJOooCPyuODObovL9rCip8roUEZGQUhD4XTKoEwlxMczOL/G6FBGRkFIQ+LVLjOeiAR2Zs2Irh2s1CJ2IRA8FQQPThmVTeaCGuV/s8LoUEZGQURA0MLZXBhnJCTo8JCJRRUHQQFxsDFOHdOaDdTvYvV9jD4lIdGhWEJhZqpn93czWmdlaMzvHzNLM7F0z2+D/3T5QxYbCtGE51NQ5/rlyq9eliIiERHP3CB4C3nLO9QXOANYC9wLvO+fygPf991uM/p3b0bdjW17OL/W6FBGRkDjtIDCzFGAc8ASAc+6wc64SmAI845/tGWBqc4sMtSuG5bB8SyWF5fu8LkVEJOias0eQC5QDT5nZMjP7i5m1AbKcc9v882wHshp7spndZmZLzGxJeXl4jfw5ZUhnYgxe0V6BiESB5gRBHDAMeNQ5NxTYzzGHgZxvvIZGx2xwzs10zg13zg3PzMxsRhmB16FdImPzMnllWSn19RpyQkQiW3OCoAQocc4t9t//O75gKDOzTgD+3y2yU/60YdmUVh5k8cZdXpciIhJUpx0EzrntwBYz6+OfNAFYA8wBbvBPuwF4rVkVeuSi/h1JTojTOQUiEvGa22vou8BzZrYSGAL8N/Ar4EIz2wBc4L/f4rRuFcukQR154/NtHDhc63U5IiJBE9ecJzvnlgPDG3loQnOWGy6mDcvhxSUlvLO6jKlDs70uR0QkKHRm8XGM6J5GdmprXtbhIRGJYAqC44iJMa4Yls38gp1srzrkdTkiIkGhIDiBrwzLod7Ba8t1ToGIRCYFwQnkZrRhWNdUXcZSRCKWguAkTBuWw/qyfazeusfrUkREAk5BcBImD+5Eq9gYZmvICRHxWH29Y/XWKp74ZGPAltms7qPRIjWpFRP6dWDOilJ+NKkv8bHKTxEJjfp6xxdle1lYWMGiogoWb9xF1cGagK5DQXCSpg3L4c1V2/l4Qznn9210HD0RkWZzzrG+bB+LiipYWFjB4o0V7D7g++DvktaaiwdkMbJHOiN7pJP968CsU0Fwks7tnUlam1a8nF+qIBCRgHHOUVi+j4WFFSwsqmBx0S4q/FdIzE5tzYR+Rz7408hpnxSUGhQEJ6lVXAyXn9GZv326maqDNaS0jve6JBFpgZxzFFccYH7BThYVVbCoaBc791UD0CklkXN7ZzKyZzrn9EinS1pwPviPpSA4BdOGZfP0gmLe+Hwb14zo6nU5ItJC7Np/mPkFO/lkw04+KdhJaeVBALLaJTCml+8wzzk90+maloSZhbw+BcEpGJSdQq8Oyby8tERBICJNOlRTx9JNu/l4w04+KShn9dY9OAdtE+MY1TOdb53bg9G9MsjNaOPJB/+xFASnwMyYNiyb37z1BZsq9tMtvY3XJYlIGKivd6zdvufLb/yfbtxFdW09cTHGsK7tmXFBb8bkZTA4O4W4MOx1qCA4RVOHZPM/b3/B7PxSZlzY2+tyRMQj26oO+r7xb9jJ/IKdXzbw5nVI5mtnd2VsXgZn56bTJiH8P2bDv8Iw0zm1NaN6pjN7WQnfuyAvLHbrRCT4auvqWbppN++tLeODdTsoLN8PQEZyAuN6ZzK6VwZjemXQMSXR40pPnYLgNEwbmsMPXlrBkk27Oat7mtfliEiQ7KuuZd76ct5bU8YHX+yg8kANrWJjOLtHGteM6MqYvAz6ZLVt8V8IFQSnYeLAjvy/11YxO79EQSASYbZWHuT9tWW8u3YHiworOFxXT2pSPOf37cCF/bIY2zuT5BZwuOdURNarCZE2CXFMHNiRf67cxv2XDSAxPtbrkkTkNDnnWL11D++sKeO9NWWs2eYbXLJHRhtuHN2dC/plMaxralg28gaKguA0XTEsh9n5pby3tozJgzt7XY6InIJDNXUsLKrgvTVlvL92B9v3HCLGYHi3NH48qS8T+mXRMzPZ6zJDRkFwmkb2SKdTSiKz80sVBCItQNXBGj5YV8bbq8qYt6GcA4frSGoVy7m9M7mgXxbn9e1AWptWXpfpCQXBaYqNMaYOzWbmvCLK91aT2TbB65JE5Bg79hzinTVlvL16OwsLK6itd2S1S2DasGwu8I/ho0O7CoJmmTY0m0c/LGTOiq3cMibX63JEBNhUsZ+3V2/n7dVl5G/ejXO+Kw1+Y2wPLh6QxRk5qcTEtOxePoGmIGiGvKy2DM5JYXZ+iYJAxCPOOdZt38vbq7fz1qrtrNu+F4ABndsx44LeTBzYkbwOyS2+i2cwKQiaadrQbB74xxrWbd9D347tvC5HJCrU1zuWbdnN26vLeGvVdjbvOoAZnNUtjZ9e2o+LB3QM2cidkUBB0EyXndGZ/3p9La/kl/KjSQoCkWCpq3csKqrgjc+38e6aMnbsrSY+1hjVM4Pbx/fkgn5Zaqs7TQqCZkpPTmB8nw68sqyU/7i4T0T3NRYJtSN9/F9dVsqcFVvZsbea1vGxnNc3k4sHdOS8vh1ol6hrgzSXgiAArhiWzXtry5hfWMG5vTO9Lkekxduy6wCvLivl1eWlFJbvJz7WGN+nA1OHZDOhXwf19AkwBUEAnN+vA+0S45idX6IgEDlNu/Yf5vWVW3l1+VaWbtoNwIjuadw8JpdLB3UiNSk6+/iHgoIgABLiYrnsjM68nF9C5YHD2mBFTtLBw3W8u7aM15aV8tH6cmrrHb2zkvnhxD5cfkbnoF2jV46mIAiQ68/pxnOLN/PMgk3cdUGe1+WIhK3aunoWFFbw6vJS3l61nf2H6+jYLpFbxuQyZUg2/Tq1/NE8WxoFQYD07diOCX078PSCjdw6LpekVnprRRpavbWKl5f6Gn137qumbWIckwd3ZsrQzpydm06sTvLyjD6tAujb5/XkikcX8vynW7hZJ5iJsPdQDa8t38oLn23h89IqWsXGcF7fTKYOyea8vmr0DRcKggA6s1saI7qn8eePi7huZDdaxakrqUQf5xxLN+3m+c+28PrKbRysqaNvx7b87PIBTBnSWW1oYUhBEGC3n9eTm576jFeXl3LV8C5elyMSMhX7qpmdX8rzn22msHw/bVrFMnVoNtPP6sLgnBQd9w9jCoIAG987k/6d2vHYR4VcOSxHg1tJRKuvd3xSsJMXPtvCO2u2U1PnGNY1ld9cMZhLB3dqERdulwAEgZnFAkuAUufcZDPLBZ4H0oGlwPXOucPNXU9LYWbcPr4n3521jHfWbGfiwE5elyQScNuqDvLSkhJe+GwLpZUHaZ8Uz9fP6c7VZ3Whd1Zbr8uTUxSIuL4LWAscGWjn18CDzrnnzewx4Bbg0QCsp8W4ZGBHuqUn8ciHhVw8oKN2iSUi1NTV88G6HTz/6WY+Wl9OvYMxvTK495K+XDQgi4Q4Nfy2VM0KAjPLAS4FfgF833yfeOcDX/PP8gzwAFEWBHGxMXxzXE9+/MrnLCisYHSvDK9LEjlt5XureXbRJv726WbK91aT1S6Bb4/vxVXDu9A1XSd8RYLm7hH8HvghcGRfMB2odM7V+u+XANmNPdHMbgNuA+jatWszywg/V5yZze/fW88jHxYoCKRFWr21iic/KeYfK7ZyuK6e8/pkcu3Z3RjfJ1ODK0aY0w4CM5sM7HDOLTWz8af6fOfcTGAmwPDhw93p1hGuEuJiuWVMLr98cx0rtlRyRpdUr0sSOaG6esf7a8t4cv5GFhXtonV8LNNHdOHGUd3pEUUXc482zdkjGA1cbmaTgER8bQQPAalmFuffK8gBSptfZst07chuPDy3gEc+LODx64d7XY5Ik/ZV1/LiZ1t4ekExm3cdIDu1NT+e1Jerh3clJUnDPEe60w4C59yPgB8B+PcI7nbOXWtmLwFX4us5dAPwWgDqbJGSE+K4YVR3/vhBAQU79tKrg3pTSHjZsusATy8o5sXPtrC3upbh3dr7Gn/7Z+nwTxQJRiffe4Dnzey/gGXAE0FYR4tx46ju/PnjIh77qIjffvUMr8sRwTnHpxt38eT8jby7powYMy4d3ImbRucyRIcwo1JAgsA59yHwof92ETAiEMuNBOnJCUw/qyvPLtrEjAt7k53a2uuSJEpV19bxzxXbeHL+RlZv3UNqUjy3j+/J9SO70zEl0evyxEM67S8Ebh3Xg2cXbeLP84p44PIBXpcjUabqQA1/XVjMXxdtonxvNb06JPPfXxnEV4Zm07qV+v6LgiAkslNbM2VINs9/tpk7J+SR1kaDbknw7dxXzROfbOT/Fm5iX3Ut5/bO5Jav5jI2L0MnOcpRFAQhcvv4HrycX8LT8zfy/Yv6eF2ORLBtVQeZOa+IWZ9uprq2nksHdeKO83rRr1O7Ez9ZopKCIER6dWjLRf2zeHpBMbed25NkDcYlAba54gCPflTI35duwTmYOjSb28f3pKf6/8sJ6NMohL59Xi/eWVPG3xZv4rZxPb0uRyJEwY69PDK3kNdWbCXWjKvP6sI3x/WkS5qGf5CToyAIoSFdUhnVM52/fLyRG0Z11yBd0iyrt1bx8NwC3ly1ncS4WG4a1Z1bx/Ugq516AMmpURCE2O3je3L9E58yO7+Ua0ZE3hhLEnxLN+3m4bkFfLBuB20T4rhjfC9uHpOrTghy2hQEITamVwaDslN4/KNCrhreRRfslpPinGNhUQV/+qCABYUVtE+K5+6LenP9Od1Jaa0hIKR5FAQhZmZ8e3xPbn8unzdXbWPy4M5elyRhzDnHvA07eei99eRvriSzbQI/vbQf14zoqqt/ScBoS/LAxQM60iOzDY/MLeTSQZ3Up1satXprFb98Yx2fFOwkO7U1/zl1IF89M4fEeLUtSWApCDwQE2N8a1xPfvjySj5aX874Ph28LknCyLaqg/zvO+t5Ob+ElNbx3De5P9eN7EarOA0CJ8GhIPDI1KHZPPjeeh75sFBBIIBvKOjHPyrkzx8XUV8Pt47twR3je2kYaAk6BYFHWsXF8I2xPfjPf65h6aZdnNktzeuSxCO1dfU8/9kWfv/eenbuO8zlZ3TmPy7uo/MAJGS0r+mh6Wd1ITUpnkc/LPS6FPGAc76rgU186GN++uoqemQk8+odo/nDNUMVAhJS2iPwUJuEOG4c1Z3fv7eBL7bvpU9HXbgmWqwqreIXr69lYVEFuRltePz6M7mof5Y6DogntEfgsRtHdSepVSyPfljgdSkSAqWVB/n+C8uZ/MdP+KJsLz+7fADvzBjHxQM6KgTEM9oj8FhqUiu+NqIrTy0o5gcX6bhwpNpzqIZHPyzkiU82Ar4zzG8f35N2iWoIFu9pjyAM3DI2lxiDmfOKvC5FAqymrp6/Lixm/P98yKMf+s4b+eAH53LPxL4KAQkb2iMIA51SWjNtaA4vLtnCnRPyyGyb4HVJEgBLN+3ix7NX8UXZXkb2SOMnk/ozKCfF67JE/o32CMLEN8/tweG6ep6cv9HrUqSZqg7U8KPZn3PFowvZe6iGx647k1m3jlQISNjSHkGY6JGZzOTBnXlq/kauG9lNF7lvgZxzzFmxlf/85xp2H6jhG2NymXFhb40JJGFPewRh5J6JvktY/uL1NR5XIqeqeOd+rn/iU+56fjnZ7ZOY853R/HRyf4WAtAjaSsNITvsk7hjfi/99dz2fbNjJmLwMr0uSE6iurePxj4r409wCEmJj+PmUAVx7djcNLy4tivYIwsyt43rQLT2J++es4nBtvdflyHEsKqpg0kMf87t313Nh/yze+8G5fP2c7goBaXEUBGEmMT6W+y/rT2H5fp5eoIbjcLRr/2HufmkF02cuorq2nqduOouHvzZMl4iUFkuHhsLQ+X2zmNC3Aw+9t4EpQ7L1ARMmnHO8tLSEX76xlr2Harl9fE/uPD+P1q10fQBp2bRHEKbuu6w/NXWOX76x1utSBCjYsY/pMxfxw7+vpEdmMq/fOZZ7JvZVCEhE0B5BmOqW3oZvntuDP35QwDUjunJ2j3SvS4pKh2rqeHhuAY99VEjr+Fh+OW0QVw/vQozaASSCaI8gjH17fC+yU1tz/5zV1Nap4TjUlhTvYuLv5/HHDwqYPLgzH9w9nmtGdFUISMRREISx1q1i+eml/Vi3fS/PLd7sdTlRo7aungffXc9Vjy+kzjmeveVsHrx6CBnJGvpDIpMODYW5iQM7MqZXBr995wsuHdxJH0ZBtmXXAe56fhn5myuZNjSbn00ZQFsNDicRTnsEYc7MeODyARw8XMdv3lrndTkR7ZVlJVzy0MdsKNvHQ9OH8LurhygEJCooCFqAXh2SuWVMLi8uKWHZ5t1elxNx9hyq4a7nlzHjhRX069SWN+4ay5Qh2V6XJRIyCoIW4rsT8ujQNoH7XltNXb3zupyIsaR4F5Me+ph/rtzG9y/szaxbR+riQBJ1TjsIzKyLmc01szVmttrM7vJPTzOzd81sg/93+8CVG72SE+L4yaX9+Ly0iheXbPG6nBavYYOwGbz4zXO4c0IecbH6biTRpzlbfS3wA+dcf2AkcIeZ9QfuBd53zuUB7/vvSwBcfkZnRnRP4zdvraPywGGvy2mxtuw6wFWPL+Sh9zcwdUg2b9w5ljO76fuKRK/TDgLn3DbnXL7/9l5gLZANTAGe8c/2DDC1uUWKj5nxsykDqDpYw2/f+cLrclqkLxuEd6hBWOSIgOwHm1l3YCiwGMhyzm3zP7QdyGriObeZ2RIzW1JeXh6IMqJCv07t+Po53Xlu8WZWlVZ5XU6LcWyD8JtqEBb5UrODwMySgZeB7znn9jR8zDnngEZbNp1zM51zw51zwzMzM5tbRlSZcWFv0pJacd9rq6hXw/EJLSnexSW/P7pBOKe9GoRFjmhWEJhZPL4QeM45N9s/uczMOvkf7wTsaF6JcqyU1vHcc0lf8jdX8sqyUq/LCVu1dfX8zt8gHBMDL31LDcIijWlOryEDngDWOud+1+ChOcAN/ts3AK+dfnnSlCuH5TCkSyq/fHMdew7VeF1O2Nmx5xBXz1zEHxo0CA/rqgZhkcY056vRaOB64HwzW+7/mQT8CrjQzDYAF/jvS4DFxBg/nzKAiv3VPPTeBq/LCSsrSyq5/E/zWbN1jxqERU7CaY815Jz7BGhqGMYJp7tcOXmDc1KZflZXnl5QzFXDu9CnY1uvS/LcnBVb+Y+XVpCRnMDLt4+if+d2XpckEvZ0sLSF+4+L+5CcEMf9c1bha5uPTvX1jt++/QV3zlrG4JwUXvvOaIWAyElSELRwaW1acffFfVhUtIt/rtx24idEoP3VtXzr2aX8aW4BVw/vwnPfGKlRWkVOgYIgAnxtRFcGdG7HL15fy/7qWq/LCaktuw5wxaMLeG9tGfdf1p9fXTGIVnHarEVOhf5jIkCsv+F4+55D/GlugdflhMziogqmPDyfrZUHefqmEdw0OhdfZzYRORUKgghxZrc0rhiWw18+LqKwfJ/X5QTdrE83c+1fFpPaOp5X7xj2p9hVAAALq0lEQVTNuN46KVHkdCkIIsi9l/QlMT6W77+4gsO1kXmN49q6eh6Ys5ofzf6cUb0yeOWO0fTITPa6LJEWTUEQQTLbJvCbKwazYksl//3GWq/LCbjKA4e58anPeHpBMd8Yk8uTNwwnpbXODxBpLl2zOMJcMqgTN43uzlPzixmRm8akQZ28LikgCnbs5RvPLKG08iC/uXIwVw3v4nVJIhFDewQR6EeX9GNIl1R++PeVFO/c73U5zTb3ix185eEF7KuuZdatIxUCIgGmIIhAreJiePjaYcTFGrc/l8+hmjqvSzotzjn+PK+IW57+jJy0JF77zhiGd0/zuiyRiKMgiFDZqa353VVnsHbbHn72j9Vel3PKqmvruPullfzijbVcPKAjL99+Dtmprb0uSyQiKQgi2Pl9s7h9fE9mfbqF2fklXpdz0vYcquH6v3zKy/kl3DUhj4e/NoykVmrOEgkW/XdFuB9c2Julm3bzk1dWMTA7hd5Z4T0wXcW+ar7+5KesL9vLH64ZyuVndPa6JJGIpz2CCBcXG8OfrhlKm4RYvv1cflgPQbGt6iBXPb6Qgh37mPn14QoBkRBREESBDu0S+cP0oRSW7+Mnr3welqOUbty5nysfXUjZnmr+evMIzuvTweuSRKKGgiBKjOqVwYwLevPq8q3M+nSL1+UcZd32PXz1sYUcrKlj1q0jObtHutcliUQVBUEU+c55vRibl8ED/1jNqtIqr8sBIH/zbq5+fBGxMfDiN0cyKCfF65JEoo6CIIrExBi/v3oIaUmtuONv+Z5f63hBwU6u+8tiUpPi+fu3RtGrQ3g3ZItEKgVBlElPTuBPXxtKye6D3PP3lZ61F7yzejs3Pv0ZXdon8dI3z6FLWpIndYiIgiAqDe+exr0T+/Lmqu08Nb845Ot/ZVkJtz+XT79O7XjhmyPp0C4x5DWIyL8oCKLUN8bmcmH/LP77jbXkb94dsvX+38JiZrywghHd03juG2eTmtQqZOsWkcYpCKKUmfHbK8+gY0oi3/3bMnbvPxz0dT48t4D/99pqLujXgaduOovkBJ3PKBIOFARRLCUpnkeuHUb53mq+/+Jy6uuD017gnONXb67jf97+gqlDOvPodWeSGB8blHWJyKlTEES5wTmp/L/J/Zj7RTmPzSsM+PLr6x0/fXUVj31UyLVnd+V3Vw0hPlabnUg40X+kcN3Iblx2Rmd++/YXLCqqCNhya+rqmfHicp5bvJnbx/fkv6YOJCZGF5cXCTcKAsHM+OW0QXRPb8Ods5ZRvre62cs8VFPHt/5vKa8t38oPJ/bhnol9MVMIiIQjtdYJAMkJcTxy3TCmPjyf787K58ZRuQCYgeELiyMf42ZHpvsf5Oh5zOCRuYUs2ljBf04dyPUju3nwikTkZCkI5Et9O7bjv6YO4u6XVrCoaFezlhUbYzx41RCmDs0OUHUiEiwKAjnKlWfmMLJHGnsO1uLw9SI6cvKxc+Bw/7qPr0eQ+3Kef82f1S5RZwuLtBAKAvk3Oe2ToL3XVYhIqKixWEQkyikIRESinIJARCTKKQhERKKcgkBEJMoFJQjMbKKZfWFmBWZ2bzDWISIigRHwIDCzWOBh4BKgP3CNmfUP9HpERCQwgrFHMAIocM4VOecOA88DU4KwHhERCYBgnFCWDWxpcL8EOPvYmczsNuA2/91qM1sVhFoCLQPY6XURJ0F1Bk5LqBFUZ6C1lDr7BGIhnp1Z7JybCcwEMLMlzrnhXtVyslRnYLWEOltCjaA6A60l1RmI5QTj0FAp0KXB/Rz/NBERCUPBCILPgDwzyzWzVsB0YE4Q1iMiIgEQ8ENDzrlaM/sO8DYQCzzpnFt9gqfNDHQdQaI6A6sl1NkSagTVGWhRVac5F5wLlouISMugM4tFRKKcgkBEJMqFNAjMrNjMPjez5Y11ezKfP/iHplhpZsNCWZ+/hj7++o787DGz7x0zz3gzq2owz30hqu1JM9vR8JwLM0szs3fNbIP/d6OXlDGzG/zzbDCzGzyo83/MbJ3/7/qKmaU28dzjbiNBrvEBMytt8Hed1MRzQzaEShN1vtCgxmIzW97Ec0PyXvrX1cXM5prZGjNbbWZ3+aeH1fZ5nDrDbftsqs7gbKPOuZD9AMVAxnEenwS8ie9a6COBxaGsr5F6YoHtQLdjpo8H/ulBPeOAYcCqBtN+A9zrv30v8OtGnpcGFPl/t/ffbh/iOi8C4vy3f91YnSezjQS5xgeAu09imygEegCtgBVA/1DWeczj/wvc5+V76V9XJ2CY/3ZbYD2+IWbCavs8Tp3htn02VWdQttFwOzQ0Bfir81kEpJpZJw/rmQAUOuc2eVjDl5xz84Bjryo/BXjGf/sZYGojT70YeNc5t8s5txt4F5gYyjqdc+8452r9dxfhO7/EM028lycjpEOoHK9OMzPgKmBWsNZ/spxz25xz+f7be4G1+EYZCKvts6k6w3D7bOr9PBmnvI2GOggc8I6ZLTXfEBPHamx4ipN98cEwnab/yc4xsxVm9qaZDQhlUcfIcs5t89/eDmQ1Mk+4va8349vza8yJtpFg+47/8MCTTRzGCKf3cixQ5pzb0MTjnryXZtYdGAosJoy3z2PqbCists9G6gz4NhrqIBjjnBuGb2TSO8xsXIjXf9LMdzLc5cBLjTycj+9w0RnAH4FXQ1lbU5xvvzCs+wOb2U+AWuC5Jmbxcht5FOgJDAG24TvsEs6u4fh7AyF/L80sGXgZ+J5zbk/Dx8Jp+2yqznDbPhupMyjbaEiDwDlX6v+9A3gF3y5MQ+E0PMUlQL5zruzYB5xze5xz+/y33wDizSwj1AX6lR05fOb/vaORecLifTWzG4HJwLX+D4V/cxLbSNA458qcc3XOuXrgz02sO1zeyzhgGvBCU/OE+r00s3h8H1rPOedm+yeH3fbZRJ1ht302VmewttGQBYGZtTGztkdu42ucOXbE0TnA181nJFDVYLcy1Jr8tmVmHf3HZzGzEfjex4oQ1tbQHOBIL4sbgNcamedt4CIza+/flbzIPy1kzGwi8EPgcufcgSbmOZltJJg1NmyP+koT6w6XIVQuANY550oaezDU76X//+EJYK1z7ncNHgqr7bOpOsNt+zxOncHZRoPd+t2gJbsHvtbrFcBq4Cf+6d8CvuW/bfgualMIfA4MD1V9x9TaBt8He0qDaQ3r/I7/NazA17A0KkR1zcK3O1iD77jfLUA68D6wAXgPSPPPOxz4S4Pn3gwU+H9u8qDOAnzHLZf7fx7zz9sZeON420gIa/w//3a30v+P0+nYGv33J+HrxVEYzBqbqtM//ekj22ODeT15L/3rG4PvsM/KBn/jSeG2fR6nznDbPpuqMyjbqIaYEBGJcuHWfVREREJMQSAiEuUUBCIiUU5BICIS5RQEIiJRTkEgUcU/euPdx3n8aTO70n/7QzML+wuYizSXgkAkQMws1usaRE6HgkAinpn9xMzWm9knQB//tCFmtqjB+PONjpPfYBmPmtkS/9jwP2swvdjMfm1m+cBXzexO/xjyK83s+eC+MpHACPjF60XCiZmdie8U+yH4tvd8YCnwV+C7zrmPzOznwP3A95pckO/szF3+b/3vm9lg59xK/2MVzjcQGWa2Fch1zlVbExc3EQk32iOQSDcWeMU5d8D5Rm+cg28IkVTn3Ef+eZ7BdwGY47nK/61/GTAA30VCjmg48NtK4Dkzuw7fKJYiYU9BIHICZpYL3A1McM4NBl4HEhvMsr/B7UvxjZc1DPjMP0qoSFhTEEikmwdMNbPW/pEjL8P3wb3bzMb657ke+KipBQDt/M+pMrMsfEOU/xsziwG6OOfmAvcAKUByYF6GSPDo24pENOdcvpm9gG/EyB34hugF35DIj5lZEr5r5N50nGWsMLNlwDp8I1TOb2LWWOBZM0vBN5LuH5xzlYF5JSLBo9FHRUSinA4NiYhEOQWBiEiUUxCIiEQ5BYGISJRTEIiIRDkFgYhIlFMQiIhEuf8P8dv0YFxxz9wAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "eval_df = bq.query(eval_query, project=PROJECT).to_dataframe()\n",
    "ax = eval_df.plot(x='dollars', y='MAPE');\n",
    "ax.set_xlim(5, 25)\n",
    "ax.set_ylim(0,100)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that the error is quadratic -- it decreases and then increases with fare amount"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Feature engineering\n",
    "\n",
    "Let's create some features that will improve our prediction result:\n",
    "<ol>\n",
    "<li> Compute distance between pickup and dropoff points as ST_Distance\n",
    "<li> Do a feature cross of day-hour combination to learn traffic\n",
    "<li> Do a feature cross of pickup-droff points to learn tolls\n",
    "</ol>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [],
   "source": [
    "def create_input_dataset_fc(split, sample=1000):\n",
    "  \"\"\"\n",
    "  split is TRAIN or EVAL\n",
    "  sample=1000 pulls 1/1000 of full dataset\n",
    "  \"\"\"\n",
    "  \n",
    "  query=\"\"\"\n",
    "WITH params AS (\n",
    "  SELECT\n",
    "  0.1 AS RES,\n",
    "  1 AS TRAIN,\n",
    "  2 AS EVAL\n",
    "  ),\n",
    "\n",
    "daynames AS\n",
    "  (SELECT ['Sun', 'Mon', 'Tues', 'Wed', 'Thurs', 'Fri', 'Sat'] AS daysofweek),\n",
    "  \n",
    "taxitrips AS (\n",
    "SELECT\n",
    "  (tolls_amount + fare_amount) AS total_fare,\n",
    "  daysofweek[ORDINAL(EXTRACT(DAYOFWEEK FROM pickup_datetime))] AS dayofweek,\n",
    "  EXTRACT(HOUR FROM pickup_datetime) AS hourofday,\n",
    "  ST_GeogPoint(pickup_longitude, pickup_latitude) AS pickup,\n",
    "  ST_GeogPoint(dropoff_longitude, dropoff_latitude) AS dropoff,\n",
    "  passenger_count AS passengers\n",
    "FROM\n",
    "  `nyc-tlc.yellow.trips`, daynames, params\n",
    "WHERE\n",
    "  trip_distance > 0 AND fare_amount > 0 AND fare_amount < 100\n",
    "  and fare_amount >= 2.5 and pickup_longitude > -78 and pickup_longitude < -70 \n",
    "      and dropoff_longitude > -78 and dropoff_longitude < -70 and pickup_latitude > 37 \n",
    "      and pickup_latitude < 45 and dropoff_latitude > 37 and dropoff_latitude < 45 \n",
    "      and passenger_count > 0\n",
    "  AND ABS(MOD(FARM_FINGERPRINT(CAST(pickup_datetime AS STRING)), {0})) = params.{1}\n",
    "),\n",
    "\n",
    "feateng AS (\n",
    "  SELECT \n",
    "    total_fare,\n",
    "    ST_Distance(pickup, dropoff) AS euclidean,\n",
    "    CONCAT(dayofweek, CAST(hourofday AS STRING)) AS dayhr_fc,\n",
    "    CONCAT(ST_AsText(ST_SnapToGrid(pickup, params.RES)),\n",
    "           ST_AsText(ST_SnapToGrid(dropoff, params.RES))) AS loc_fc\n",
    "  FROM\n",
    "    taxitrips, params\n",
    ")\n",
    "  \n",
    "  SELECT *\n",
    "  FROM feateng\n",
    "  \"\"\".format(sample, split)\n",
    "  \n",
    "  return query"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Train:  this will take about <b> 5-10 minutes </b>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "CREATE or REPLACE MODEL demos.taxifare_model_fc\n",
      "OPTIONS\n",
      "  (model_type='linear_reg', labels=['total_fare'], min_rel_progress=0.005, l2_reg=0.1) AS\n",
      "  \n",
      "  \n",
      "WITH params AS (\n",
      "  SELECT\n",
      "  0.1 AS RES,\n",
      "  1 AS TRAIN,\n",
      "  2 AS EVAL\n",
      "  ),\n",
      "\n",
      "daynames AS\n",
      "  (SELECT ['Sun', 'Mon', 'Tues', 'Wed', 'Thurs', 'Fri', 'Sat'] AS daysofweek),\n",
      "  \n",
      "taxitrips AS (\n",
      "SELECT\n",
      "  (tolls_amount + fare_amount) AS total_fare,\n",
      "  daysofweek[ORDINAL(EXTRACT(DAYOFWEEK FROM pickup_datetime))] AS dayofweek,\n",
      "  EXTRACT(HOUR FROM pickup_datetime) AS hourofday,\n",
      "  ST_GeogPoint(pickup_longitude, pickup_latitude) AS pickup,\n",
      "  ST_GeogPoint(dropoff_longitude, dropoff_latitude) AS dropoff,\n",
      "  passenger_count AS passengers\n",
      "FROM\n",
      "  `nyc-tlc.yellow.trips`, daynames, params\n",
      "WHERE\n",
      "  trip_distance > 0 AND fare_amount > 0 AND fare_amount < 100\n",
      "  and fare_amount >= 2.5 and pickup_longitude > -78 and pickup_longitude < -70 \n",
      "      and dropoff_longitude > -78 and dropoff_longitude < -70 and pickup_latitude > 37 \n",
      "      and pickup_latitude < 45 and dropoff_latitude > 37 and dropoff_latitude < 45 \n",
      "      and passenger_count > 0\n",
      "  AND ABS(MOD(FARM_FINGERPRINT(CAST(pickup_datetime AS STRING)), 1000)) = params.TRAIN\n",
      "),\n",
      "\n",
      "feateng AS (\n",
      "  SELECT \n",
      "    total_fare,\n",
      "    ST_Distance(pickup, dropoff) AS euclidean,\n",
      "    CONCAT(dayofweek, CAST(hourofday AS STRING)) AS dayhr_fc,\n",
      "    CONCAT(ST_AsText(ST_SnapToGrid(pickup, params.RES)),\n",
      "           ST_AsText(ST_SnapToGrid(dropoff, params.RES))) AS loc_fc\n",
      "  FROM\n",
      "    taxitrips, params\n",
      ")\n",
      "  \n",
      "  SELECT *\n",
      "  FROM feateng\n",
      "  \n",
      "  \n",
      "  \n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<google.cloud.bigquery.table._EmptyRowIterator at 0x7fa39d3244e0>"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_query = create_train_query( create_input_dataset_fc('TRAIN'), 'demos.taxifare_model_fc' )\n",
    "print(train_query)\n",
    "bq.query(train_query, project=PROJECT).result()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Evaluate the model once it is trained."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "SELECT iteration, loss from ML.TRAINING_INFO(MODEL demos.taxifare_model_fc)\n",
      "   iteration      loss\n",
      "0          0  3.895943\n"
     ]
    }
   ],
   "source": [
    "show_training_loss('demos.taxifare_model_fc')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mean_absolute_error</th>\n",
       "      <th>mean_squared_error</th>\n",
       "      <th>mean_squared_log_error</th>\n",
       "      <th>median_absolute_error</th>\n",
       "      <th>r2_score</th>\n",
       "      <th>explained_variance</th>\n",
       "      <th>rmse</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2.229484</td>\n",
       "      <td>24.638316</td>\n",
       "      <td>0.066012</td>\n",
       "      <td>1.416893</td>\n",
       "      <td>0.714704</td>\n",
       "      <td>0.714706</td>\n",
       "      <td>4.9637</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   mean_absolute_error  mean_squared_error  mean_squared_log_error  \\\n",
       "0             2.229484           24.638316                0.066012   \n",
       "\n",
       "   median_absolute_error  r2_score  explained_variance    rmse  \n",
       "0               1.416893  0.714704            0.714706  4.9637  "
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "eval_query = create_eval_query( create_input_dataset_fc('EVAL'), 'demos.taxifare_model_fc' )\n",
    "eval_df = bq.query(eval_query, project=PROJECT).to_dataframe()\n",
    "eval_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0, 100)"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEKCAYAAAAfGVI8AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAGtVJREFUeJzt3X2QHPV95/H3dx72UdLu6gFZ0gojDoyRhSzwHhFnTPmCy8GyAF3sI6JMIiPqZFfFwSZF2RyuAuIq14UkF86+5Bwr4CAf3CKw8SGDg8NxYHOuoFgLkhAPjoS8QguSENKuJLTax/neH92zGi07q2XncfX7vKqmuvvXD/Od3t7+THfP9Ji7IyIi4UpUugAREaksBYGISOAUBCIigVMQiIgETkEgIhI4BYGISOBOGwRm9gMze9vMduS0zTSzp8xsZ9xtidvNzL5rZrvMbLuZXVLK4kVEpHATOSK4H7hqVNttwNPufj7wdDwM8Bng/PixDvheccoUEZFSOW0QuPsvgcOjmq8FNsT9G4BVOe0/9MjzQLOZzStWsSIiUnypSc431933xf37gblx/wJgb850XXHbPkYxs3VERw00NjZ+7MMf/vAkSxERCVNHR8c77j6n0OVMNghGuLub2fu+T4W7rwfWA7S1tfmWLVsKLUVEJChmtqcYy5nsp4YOZE/5xN234/Y3gYU507XGbSIiUqUmGwSbgDVx/xrgsZz2P4o/PbQcOJJzCklERKrQaU8NmVk78Elgtpl1AXcCfw48bGY3AXuA6+LJfwasAHYBvcCNJahZRESK6LRB4O7X5xl15RjTOvDHhRYlIjLa4OAgXV1d9PX1VbqUsqurq6O1tZV0Ol2S5Rd8sVhEpBy6urqYPn0655xzDmZW6XLKxt05dOgQXV1dLFq0qCTPoVtMiMiU0NfXx6xZs4IKAQAzY9asWSU9ElIQiMiUEVoIZJX6dSsIREQCpyAQEZkgM+OGG24YGR4aGmLOnDmsXLnylOlWrVrF8uXLT2m76667WLBgAcuWLWPJkiVs2rTpPe3ZR09PT+lfTA4FgYjIBDU2NrJjxw5OnDgBwFNPPcWCBQtOmaanp4eOjg6OHDnC7t27Txl3yy23sHXrVh555BHWrl1LJpM5pT37aG5uLs8LiikIRETehxUrVvDEE08A0N7ezvXXn/oJ+0cffZSrr76a1atX89BDD425jAsvvJBUKsU777xT8nonQh8fFZEp589++jKvvHW0qMtcPH8Gd179kdNOt3r1ar71rW+xcuVKtm/fztq1a3nuuedGxre3t3PHHXcwd+5cPve5z3H77be/ZxmbN28mkUgwZ050v7h77rmHBx54AICWlhaeeeaZIr2qiVEQiIi8D0uXLqWzs5P29nZWrFhxyrgDBw6wc+dOLr/8csyMdDrNjh07WLJkCXByhz99+nQ2btw48mmgW265hVtvvbXsryVLQSAiU85E3rmX0jXXXMOtt97Ks88+y6FDh0baH374Ybq7u0e++HX06FHa29v59re/DVR+h5+PrhGIiLxPa9eu5c477+Siiy46pb29vZ0nn3ySzs5OOjs76ejoyHudoJooCERE3qfW1lZuvvnmU9o6OzvZs2fPKR8bXbRoEU1NTWzevHnc5d1zzz2nfHy0s7OzFGXnZdF94ipLP0wjIqfz6quvcuGFF1a6jIoZ6/WbWYe7txW6bB0RiIgETkEgIhI4BYGITBnVcCq7Ekr9uhUEIjIl1NXVcejQoeDCIPt7BHV1dSV7Dn2PQESmhNbWVrq6ujh48GClSym77C+UlYqCQESmhHQ6XbJf6AqdTg2JiAROQSAiEjgFgYhI4BQEIiKBUxCIiAROQSAiEjgFgYhI4BQEIiKBUxCIiAROQSAiEjgFgYhI4BQEIiKBUxCIiAROQSAiEjgFgYhI4BQEIiKBKygIzOwWM3vZzHaYWbuZ1ZnZIjPbbGa7zGyjmdUUq1gRESm+SQeBmS0Abgba3H0JkARWA3cD97j7eUA3cFMxChURkdIo9NRQCqg3sxTQAOwDfhf4UTx+A7CqwOcQEZESmnQQuPubwF8BbxAFwBGgA+hx96F4si5gwVjzm9k6M9tiZltC/DFqEZFqUcipoRbgWmARMB9oBK6a6Pzuvt7d29y9bc6cOZMtQ0REClTIqaFPAb9194PuPgg8CnwcaI5PFQG0Am8WWKOIiJRQIUHwBrDczBrMzIArgVeAZ4DPx9OsAR4rrEQRESmlQq4RbCa6KPwC8FK8rPXAN4A/NbNdwCzgviLUKSIiJZI6/ST5ufudwJ2jmncDlxayXBERKR99s1hEJHAKAhGRwCkIREQCpyAQEQmcgkBEJHAKAhGRwCkIREQCpyAQEQmcgkBEJHAKAhGRwCkIREQCpyAQEQmcgkBEJHAKAhGRwCkIREQCpyAQEQmcgkBEJHAKAhGRwCkIREQCpyAQEQmcgkBEJHAKAhGRwCkIREQCpyAQEQmcgkBEJHAKAhGRwCkIREQCpyAQEQmcgkBEJHAKAhGRwCkIREQCpyAQEQmcgkBEJHAKAhGRwBUUBGbWbGY/MrPXzOxVM7vMzGaa2VNmtjPuthSrWBERKb5Cjwi+Azzp7h8GPgq8CtwGPO3u5wNPx8MiIlKlJh0EZtYEXAHcB+DuA+7eA1wLbIgn2wCsKrRIEREpnUKOCBYBB4F/MLMXzexeM2sE5rr7vnia/cDcsWY2s3VmtsXMthw8eLCAMkREpBCFBEEKuAT4nrtfDBxn1Gkgd3fAx5rZ3de7e5u7t82ZM6eAMkREpBCFBEEX0OXum+PhHxEFwwEzmwcQd98urEQRESmlSQeBu+8H9prZBXHTlcArwCZgTdy2BnisoApFRKSkUgXO/yfAg2ZWA+wGbiQKl4fN7CZgD3Bdgc8hIiIlVFAQuPtWoG2MUVcWslwRESkffbNYRCRwCgIRkcApCEREAqcgEBEJnIJARCRwCgIRkcApCEREAqcgEBEJnIJARCRwCgIRkcApCEREAqcgEBEJnIJARCRwCgIRkcApCEREAqcgEBEJnIJARCRwCgIRkcApCEREAqcgEBEJnIJARCRwCgIRkcApCEREAqcgEBEJnIJARCRwCgIRkcApCEREAqcgEBEJnIJARCRwCgIRkcApCEREAqcgEBEJnIJARCRwCgIRkcApCEREAldwEJhZ0sxeNLPH4+FFZrbZzHaZ2UYzqym8TBERKZViHBF8FXg1Z/hu4B53Pw/oBm4qwnOIiEiJFBQEZtYKfBa4Nx424HeBH8WTbABWFfIcIiJSWoUeEfw34OtAJh6eBfS4+1A83AUsGGtGM1tnZlvMbMvBgwcLLENERCZr0kFgZiuBt929YzLzu/t6d29z97Y5c+ZMtgwRESlQqoB5Pw5cY2YrgDpgBvAdoNnMUvFRQSvwZuFliohIqUz6iMDd/7O7t7r7OcBq4P+6+xeAZ4DPx5OtAR4ruEoRESmZUnyP4BvAn5rZLqJrBveV4DlERKRICjk1NMLdnwWejft3A5cWY7kiIlJ6+maxiEjgFAQiIoFTEIiIBE5BICISOAWBiEjgFAQiIoFTEIiIBE5BICISOAWBiEjgFAQiIoFTEIiIBE5BICISOAWBiEjgFAQiIoFTEIiIBE5BICISuKoIgv6hTKVLEBEJVlUEwd7DvRzpHax0GSIiQaqKIOgbHOb6v3+eQ+/2V7oUEZHgVEUQfHBWI7vfeZc/WP88+4/0VbocEZGgVEUQTK9LseHGS9nXc4Lrvv/P7D3cW+mSRESCURVBAPA7587iwf+0nJ7eAa77/j+z++C7lS5JRCQIVRMEAMsWNvPQussYGMpw3fef57X9RytdkojIGa+qggBg8fwZbPzSZaQSxh98/3m27e2pdEkiIme0qgsCgPPOmsYjX76MGfUpvnDvZn7debjSJYmInLGqMggAFs5s4OEvXcZZM2r5w/s289zOg5UuSUTkjFS1QQAwr6meh790GefMauSm+7fw1CsHKl2SiMgZp6qDAGD2tFoeWrecC+fP4MsPdLBp21uVLklE5IxS9UEA0NxQwwM3XcrHPtjCVx96kYd/vbfSJYmInDGmRBAATK9Ls+HGS7n8vNl8/cfbuf9Xv610SSIiZ4QpEwQA9TVJ7l3TxqcXz+Wun77C/3h2V6VLEhGZ8qZUEADUppL87Rcu4dpl8/mLJ3/DX/38N7h7pcsSEZmyUpUuYDLSyQR/fd0y6tNJ/uaZXRwfGOKOlYsxs0qXJiIy5UzJIABIJoz/8vsXUV+T5B9+1UnHnm6u+eh8Prt0HvOa6itdnojIlGGTPa1iZguBHwJzAQfWu/t3zGwmsBE4B+gErnP37vGW1dbW5lu2bJlUHe7O//qXN2j/lzfY8WZ0b6J/e04LK5fO5zMXfYCzptdNarkiItXOzDrcva3g5RQQBPOAee7+gplNBzqAVcAXgcPu/udmdhvQ4u7fGG9ZhQRBrt++c5wntr/FT7ft4zcHjpEwWH7uLFYunc9VSz7AzMaagp9DRKRaVDwI3rMgs8eAv4kfn3T3fXFYPOvuF4w3b7GCINfOA8f46fZ9PL7tLXa/c5xkwvj4ebNZuXQev/eRD9BUny7q84mIlFtVBYGZnQP8ElgCvOHuzXG7Ad3Z4VHzrAPWAZx99tkf27NnT8F1jMXdeWXfUR7fvo/Ht7/F3sMnqEkmuOJDs1m5dD6fWjyXabVT9lKJyLgyGad3cJjegSEAkmYkE0YiYaQSRiIeTlrUNlnuznDGGcqM7mai7vDJ9qFMhqHh7DQZBuNxg8OZuHty3ux8g5mT4xIGNakE6WSC2rhbk0yQTkXdmpxuOmmnDGfnS5iRMKb8B0yqJgjMbBrwC+Db7v6omfXk7vjNrNvdW8ZbRimOCMbi7mzrOsLj297iiZf2se9IH7WpBP/+grNY+dF5LD93FrOn1Za8Dqkug8MZjvUNceTEIEdPDHK0bzDuH+J4/xCJhFGTNNLJBKlkvHPJ059OJuLHyf5EAjIZRnaKw6N2ltn+jEc7zIyf3EkOZ2A4k6F/KMPx/miHPtIdGKK3f5jegeGof2CY4/1RNzvdicHhCa8HM0YCIRsY2Ud2xzlS23BO/XEITGXZUEgYGAbGSL8ZJMwwonVk8bqpTSWoSydHunXpuJvK6U8nqU0n4rbc9gSNNSlaGmtoaahhZmMNTfVpku8zjIsVBAW9FTazNPBj4EF3fzRuPmBm83JODb1daJHFYmYsW9jMsoXN3L7iQjre6I5DYT9PvrwfgIUz67l4YQvLFjZz8dnNLJ4/g9pUssKVT33uzrH+IY70Do7scHtORP3ZR09v1J4dHhzOUJvzLi77zi6dSlCbPLU9261Nndw5p1MJTgwMxzv3oZFlH+2LdvLZ/t6Bie8sq0VtKkFjbYqGmmT8SNFYm2RmYwONNUkaalM0pKNuYzwNZgwPZxj26EghuwPP5ARRdseeyfhICEXTRX/DZHwkkUwkSCYgmUjEw3F78uT4U9oTRioZBUo6GY1LJY1UItufIJkw0kmLu4mReVPZ6RPR9BmPjh76hzIMDmcYGM4wEPdHbT4yPDAUP3KmGRjKkHFwnIwD7jhRyLkzMg4/2ZY73j1aR/2DGfqGMvQNDtM3OEz/YIbDxwfi4ZPtfXENp2MGTfVpWhpqaGlIM7OxhuY4JLJtLY3Z4Wi6YinkYrEBG4guDH8tp/0vgUM5F4tnuvvXx1tWuY4I8hnOOC++0c0Lb3Tz4hs9bN3bw74jfQDUJBMsnj9jJBguObuF1pb6KX1I6e4cPNbP3u4TdHX30pXT3Xu4l/1Ho9eeTiRy/rHjf9qknbIzOPkPfXKaZMLIuJ+yUz/aNzTuu8ZUwmhuSDOjPk1T/EgnE6f+Qw+P/U89kP3nHx77n80MptemmFGfZkZdtOwZ9amc/jQz6lI0NUTjszXMqEvTWJsk49FRw+BwdKpiYHR//PyDmZP9Q5mTdQ27j6yfpNnIOhz9rjuViI4esuswu54TZtSlEzTUpGioTdKQTpJKTrnvggYtk3H6s6ExFAXF8f4hunsHOHx8gO7jA3T3DtLdG3ePR+09vQMc7h2gb3DsbXvP3Ssr/qmhy4HngJeAbJW3A5uBh4GzgT1EHx8d95dlKh0EY9l/pI+te6NgeHFvD9u7ekb+GLMaa7j47OY4HFpY2trE9LrxLz4PDkd/+Hfjx/H+IY71RYfvx/uHOBa3ZdypTyepr4kOJeuzj9zhmpPtdTXRu+TcYHJ3Dh0fGNnB7z0cd+PhN7tP0D/qHcrsaTUsaGmgtaWe+U11mFl8fjYTndfNOaebbzj3nDDAjPo0zQ01NNWnRnbuzfU1p+zsmxuibkNNsuBwdfeRQMgGRV06yfTaVEHnv0Uq7cTA8Eho9PQOcrg3Co8vfnxRdVwjKIZqDILRhoYzvLb/GFv39sRHDd28fvA4EL3jPP+saZx31jT6BjO82xfv7AeGRvpH73iLKWGMBERtKsnh4wPvOTfc0pCmNd7RL5wZdVtb6lnY0sCClnoaanTBXGSqqYprBCFJJRMsWdDEkgVN3LD8gwAc6R1kW9fJYHht3zEaapM01qSY31xHY22KxtoU0+PutOyjLjucZFptdPphem2ahtokSTP6hoY5MRBd6OsbHObEQIYTg9HwiYG4bXD0NMMj07Q01Izs5Ftn1rOguf60RywiEi4FQQGaGtJc8aE5XPGhOUVdbkNNSu/QRaRsdMVJRCRwCgIRkcApCEREAqcgEBEJnIJARCRwCgIRkcApCEREAqcgEBEJnIJARCRwCgIRkcApCEREAqcgEBEJnIJARCRwCgIRkcApCEREAqcgEBEJnIJARCRwCgIRkcApCEREAqcgEBEJnIJARCRwCgIRkcApCEREAqcgEBEJnIJARCRwCgIRkcApCEREAqcgEBEJnIJARCRwCgIRkcApCEREAqcgEBEJnIJARCRwJQkCM7vKzH5jZrvM7LZSPIeIiBRH0YPAzJLA3wKfARYD15vZ4mI/j4iIFEcpjgguBXa5+253HwAeAq4twfOIiEgRpEqwzAXA3pzhLuB3Rk9kZuuAdfFgv5ntKEEtxTYbeKfSRUyA6iyeqVAjqM5imyp1XlCMhZQiCCbE3dcD6wHMbIu7t1WqlolSncU1FeqcCjWC6iy2qVRnMZZTilNDbwILc4Zb4zYREalCpQiCXwPnm9kiM6sBVgObSvA8IiJSBEU/NeTuQ2b2FeDnQBL4gbu/fJrZ1he7jhJRncU1FeqcCjWC6iy2oOo0dy/GckREZIrSN4tFRAKnIBARCVxZg8DMOs3sJTPbOtbHnizy3fjWFNvN7JJy1hfXcEFcX/Zx1My+NmqaT5rZkZxp7ihTbT8ws7dzv3NhZjPN7Ckz2xl3W/LMuyaeZqeZralAnX9pZq/Ff9efmFlznnnH3UZKXONdZvZmzt91RZ55y3YLlTx1bsypsdPMtuaZtyzrMn6uhWb2jJm9YmYvm9lX4/aq2j7HqbPats98dZZmG3X3sj2ATmD2OONXAP8IGLAc2FzO+saoJwnsBz44qv2TwOMVqOcK4BJgR07bXwC3xf23AXePMd9MYHfcbYn7W8pc56eBVNx/91h1TmQbKXGNdwG3TmCbeB04F6gBtgGLy1nnqPH/Fbijkusyfq55wCVx/3TgX4luMVNV2+c4dVbb9pmvzpJso9V2auha4IceeR5oNrN5FaznSuB1d99TwRpGuPsvgcOjmq8FNsT9G4BVY8z6e8BT7n7Y3buBp4Crylmnu/+Tuw/Fg88Tfb+kYvKsy4ko6y1UxqvTzAy4Dmgv1fNPlLvvc/cX4v5jwKtEdxmoqu0zX51VuH3mW58T8b630XIHgQP/ZGYdFt1iYrSxbk8x0RdfCqvJ/092mZltM7N/NLOPlLOoUea6+764fz8wd4xpqm29riU68hvL6baRUvtKfHrgB3lOY1TTuvwEcMDdd+YZX5F1aWbnABcDm6ni7XNUnbmqavsco86ib6PlDoLL3f0SojuT/rGZXVHm558wi74Mdw3wyBijXyA6XfRR4L8D/7ucteXj0XFhVX8e2My+CQwBD+aZpJLbyPeAfwMsA/YRnXapZtcz/tFA2delmU0Dfgx8zd2P5o6rpu0zX53Vtn2OUWdJttGyBoG7vxl33wZ+QnQIk6uabk/xGeAFdz8weoS7H3X3d+P+nwFpM5td7gJjB7Knz+Lu22NMUxXr1cy+CKwEvhDvFN5jAttIybj7AXcfdvcM8Pd5nrta1mUK+H1gY75pyr0uzSxNtNN60N0fjZurbvvMU2fVbZ9j1VmqbbRsQWBmjWY2PdtPdHFm9B1HNwF/ZJHlwJGcw8pyy/tuy8w+EJ+fxcwuJVqPh8pYW65NQPZTFmuAx8aY5ufAp82sJT6U/HTcVjZmdhXwdeAad+/NM81EtpFS1ph7Peo/5HnuarmFyqeA19y9a6yR5V6X8f/DfcCr7v7XOaOqavvMV2e1bZ/j1FmabbTUV79zrmSfS3T1ehvwMvDNuP3LwJfjfiP6UZvXgZeAtnLVN6rWRqIde1NOW26dX4lfwzaiC0v/rkx1tRMdDg4Snfe7CZgFPA3sBP4PMDOetg24N2fetcCu+HFjBercRXTecmv8+Lt42vnAz8bbRspY4/+Mt7vt8T/OvNE1xsMriD7F8Xopa8xXZ9x+f3Z7zJm2Iusyfr7LiU77bM/5G6+otu1znDqrbfvMV2dJtlHdYkJEJHDV9vFREREpMwWBiEjgFAQiIoFTEIiIBE5BICISOAWBBCW+e+Ot44y/38w+H/c/a2ZV/wPmIoVSEIgUiZklK12DyGQoCOSMZ2bfNLN/NbP/B1wQty0zs+dz7j8/5n3yc5bxPTPbEt8b/s9y2jvN7G4zewH4j2Z2c3wP+e1m9lBpX5lIcRT9x+tFqomZfYzoK/bLiLb3F4AO4IfAn7j7L8zsW8CdwNfyLij6dubh+F3/02a21N23x+MOeXQjMszsLWCRu/dbnh83Eak2OiKQM90ngJ+4e69Hd2/cRHQLkWZ3/0U8zQaiH4AZz3Xxu/4XgY8Q/UhIVu6N37YDD5rZDUR3sRSpegoCkdMws0XArcCV7r4UeAKoy5nkeE7/Z4nul3UJ8Ov4LqEiVU1BIGe6XwKrzKw+vnPk1UQ77m4z+0Q8zR8Cv8i3AGBGPM8RM5tLdIvy9zCzBLDQ3Z8BvgE0AdOK8zJESkfvVuSM5u4vmNlGojtGvk10i16Ibon8d2bWQPQbuTeOs4xtZvYi8BrRHSp/lWfSJPCAmTUR3Un3u+7eU5xXIlI6uvuoiEjgdGpIRCRwCgIRkcApCEREAqcgEBEJnIJARCRwCgIRkcApCEREAvf/AVJ96LjJ8DTVAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "eval_query = create_faceted_eval_query( create_input_dataset_fc('EVAL'), 'demos.taxifare_model_fc')\n",
    "eval_df = bq.query(eval_query, project=PROJECT).to_dataframe()\n",
    "ax = eval_df.plot(x='dollars', y='MAPE');\n",
    "ax.set_xlim(5, 25)\n",
    "ax.set_ylim(0,100)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that, with the feature crosses and spatial functions, we have gotten a lower RMSE and somewhat addressed the problem of errors increasing with fare amount."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## More data?\n",
    "\n",
    "What if we train on more data? Note the sample=100 to use 10 million rows.  This will take <b> 10-15 min </b>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<google.cloud.bigquery.table._EmptyRowIterator at 0x7fa39d341748>"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_query = create_train_query( create_input_dataset_fc('TRAIN', sample=100), 'demos.taxifare_model_fc10' )\n",
    "bq.query(train_query, project=PROJECT).result()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "SELECT iteration, loss from ML.TRAINING_INFO(MODEL demos.taxifare_model_fc10)\n",
      "   iteration      loss\n",
      "0          0  3.907224\n"
     ]
    }
   ],
   "source": [
    "show_training_loss('demos.taxifare_model_fc10')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mean_absolute_error</th>\n",
       "      <th>mean_squared_error</th>\n",
       "      <th>mean_squared_log_error</th>\n",
       "      <th>median_absolute_error</th>\n",
       "      <th>r2_score</th>\n",
       "      <th>explained_variance</th>\n",
       "      <th>rmse</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2.223445</td>\n",
       "      <td>23.050299</td>\n",
       "      <td>0.065663</td>\n",
       "      <td>1.415918</td>\n",
       "      <td>0.733092</td>\n",
       "      <td>0.733094</td>\n",
       "      <td>4.801073</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   mean_absolute_error  mean_squared_error  mean_squared_log_error  \\\n",
       "0             2.223445           23.050299                0.065663   \n",
       "\n",
       "   median_absolute_error  r2_score  explained_variance      rmse  \n",
       "0               1.415918  0.733092            0.733094  4.801073  "
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "eval_query = create_eval_query( create_input_dataset_fc('EVAL'), 'demos.taxifare_model_fc10' )\n",
    "eval_df = bq.query(eval_query, project=PROJECT).to_dataframe()\n",
    "eval_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It's better (\\$4.80~ vs~ \\$4.96, which is promising). We have to experiment with changing the resolution of the feature cross also -- because we have more data, it is possible that we could use more feature crosses"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0, 100)"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEKCAYAAAAfGVI8AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAGuRJREFUeJzt3XuQVOd55/Hv05e5MDNiBphgmNEFx7LuGKEpWVnLXlWUVWQkIRR7VWh9wUIVNlV2FCurxIpcJSmu8q61m7XWjrecYMsRipURsiIH4otsIuu2qRIxgwAhoRiEB3kAAYIZEDD3fvaPc3roGWZgmO6e7uH9fYquc/o9p08/3XN4f33e7j5t7o6IiIQrUeoCRESktBQEIiKBUxCIiAROQSAiEjgFgYhI4BQEIiKBO20QmNn3zGy/mW3NaZthZuvMbHs8bYjbzcy+aWY7zGyLmS0sZvEiIpK/8RwRPAbcOKLtPuA5d78QeC6+DvBx4ML4sgL4dmHKFBGRYjltELj7S8ChEc23Aqvi+VXAkpz2xz3yClBvZnMKVayIiBReaoK3m+3ue+P5d4DZ8XwT8Juc9Tritr2MYGYriI4aqKmpueriiy+eYCkiImFqa2t7190b893ORINgiLu7mZ3xeSrcfSWwEqClpcU3bNiQbykiIkExs12F2M5EPzW0LzvkE0/3x+27gXNz1muO20REpExNNAjWAsvi+WXAmpz2z8afHroGOJwzhCQiImXotENDZtYKXAfMMrMO4EHga8BTZnYXsAu4PV79J8AiYAdwHLizCDWLiEgBnTYI3P2OMRZdP8q6Dnw+36JEREbq7++no6ODnp6eUpcy6aqqqmhubiadThdl+3m/WSwiMhk6Ojqoq6vjggsuwMxKXc6kcXcOHjxIR0cH8+bNK8p96BQTIjIl9PT0MHPmzKBCAMDMmDlzZlGPhBQEIjJlhBYCWcV+3AoCEZHAKQhERMbJzPj0pz89dH1gYIDGxkZuvvnmYestWbKEa665ZljbQw89RFNTEwsWLODyyy9n7dq1J7VnL11dXcV/MDkUBCIi41RTU8PWrVvp7u4GYN26dTQ1NQ1bp6uri7a2Ng4fPszOnTuHLbvnnnvYtGkTP/jBD1i+fDmZTGZYe/ZSX18/OQ8opiAQETkDixYt4sc//jEAra2t3HHH8E/YP/PMM9xyyy0sXbqUJ598ctRtXHLJJaRSKd59992i1zse+vioiEw5f/nPr/PGniMF3ealc8/hwVsuO+16S5cu5Stf+Qo333wzW7ZsYfny5bz88stDy1tbW3nggQeYPXs2n/jEJ7j//vtP2sb69etJJBI0Nkbni3vkkUf4/ve/D0BDQwPPP/98gR7V+CgIRETOwPz582lvb6e1tZVFixYNW7Zv3z62b9/Otddei5mRTqfZunUrl19+OXCiw6+rq2P16tVDnwa65557uPfeeyf9sWQpCERkyhnPK/diWrx4Mffeey8vvPACBw8eHGp/6qmn6OzsHPri15EjR2htbeWrX/0qUPoOfyx6j0BE5AwtX76cBx98kCuuuGJYe2trK88++yzt7e20t7fT1tY25vsE5URBICJyhpqbm7n77ruHtbW3t7Nr165hHxudN28e06dPZ/369afc3iOPPDLs46Pt7e3FKHtMFp0nrrT0wzQicjrbtm3jkksuKXUZJTPa4zezNndvyXfbOiIQEQmcgkBEJHAKAhGZMsphKLsUiv24FQQiMiVUVVVx8ODB4MIg+3sEVVVVRbsPfY9ARKaE5uZmOjo6OHDgQKlLmXTZXygrFgWBiEwJ6XS6aL/QFToNDYmIBE5BICISOAWBiEjgFAQiIoFTEIiIBE5BICISOAWBiEjgFAQiIoFTEIiIBE5BICISOAWBiEjgFAQiIoFTEIiIBE5BICISOAWBiEjgFAQiIoHLKwjM7B4ze93MtppZq5lVmdk8M1tvZjvMbLWZVRSqWBERKbwJB4GZNQF3Ay3ufjmQBJYCDwOPuPsHgE7grkIUKiIixZHv0FAKqDazFDAN2Av8LvB0vHwVsCTP+xARkSKacBC4+27gr4C3iQLgMNAGdLn7QLxaB9A02u3NbIWZbTCzDSH+GLWISLnIZ2ioAbgVmAfMBWqAG8d7e3df6e4t7t7S2Ng40TJERCRP+QwN/R7wa3c/4O79wDPAR4D6eKgIoBnYnWeNIiJSRPkEwdvANWY2zcwMuB54A3ge+GS8zjJgTX4liohIMeXzHsF6ojeFNwKvxdtaCXwJ+FMz2wHMBB4tQJ0iIlIkqdOvMjZ3fxB4cETzTuDqfLYrIiKTR98sFhEJnIJARCRwCgIRkcApCEREAqcgEBEJnIJARCRwCgIRkcApCEREAqcgEBEJnIJARCRwCgIRkcApCEREAqcgEBEJnIJARCRwCgIRkcApCEREAqcgEBEJnIJARCRwCgIRkcApCEREAqcgEBEJnIJARCRwCgIRkcApCEREAqcgEBEJnIJARCRwCgIRkcApCEREAqcgEBEJnIJARCRwCgIRkcApCEREAqcgEBEJnIJARCRweQWBmdWb2dNm9qaZbTOz3zGzGWa2zsy2x9OGQhUrIiKFl+8RwTeAZ939YuBDwDbgPuA5d78QeC6+LiIiZWrCQWBm04GPAY8CuHufu3cBtwKr4tVWAUvyLVJERIonnyOCecAB4O/M7FUz+66Z1QCz3X1vvM47wOzRbmxmK8xsg5ltOHDgQB5liIhIPvIJghSwEPi2u18JHGPEMJC7O+Cj3djdV7p7i7u3NDY25lGGiIjkI58g6AA63H19fP1pomDYZ2ZzAOLp/vxKFBGRYppwELj7O8BvzOyiuOl64A1gLbAsblsGrMmrQhERKapUnrf/Y+AJM6sAdgJ3EoXLU2Z2F7ALuD3P+xARkSLKKwjcfRPQMsqi6/PZroiITB59s1hEJHAKAhGRwCkIREQCpyAQEQmcgkBEJHAKAhGRwCkIREQCpyAQEQmcgkBEJHAKAhGRwCkIREQCpyAQEQmcgkBEJHAKAhGRwCkIREQCpyAQEQmcgkBEJHAKAhGRwCkIREQCpyAQEQmcgkBEJHAKAhGRwCkIREQCpyAQEQmcgkBEJHAKAhGRwCkIREQCpyAQEQmcgkBEJHAKAhGRwCkIREQCpyAQEQmcgkBEJHAKAhGRwCkIREQCl3cQmFnSzF41sx/F1+eZ2Xoz22Fmq82sIv8yRUSkWApxRPAnwLac6w8Dj7j7B4BO4K4C3IeIiBRJXkFgZs3ATcB34+sG/C7wdLzKKmBJPvchIiLFle8Rwf8B/hzIxNdnAl3uPhBf7wCaRruhma0wsw1mtuHAgQN5liEiIhM14SAws5uB/e7eNpHbu/tKd29x95bGxsaJliEiInlK5XHbjwCLzWwRUAWcA3wDqDezVHxU0Azszr9MEREplgkfEbj7X7h7s7tfACwFfuHunwKeBz4Zr7YMWJN3lSIiUjTF+B7Bl4A/NbMdRO8ZPFqE+xARkQLJZ2hoiLu/ALwQz+8Eri7EdkVEpPj0zWIRkcApCEREAqcgEBEJnIJARCRwCgIRkcApCEREAqcgEBEJnIJARCRwCgIRkcApCEREAqcgEBEJnIJARCRwCgIRkcApCEREAqcgEBEJnIJARCRwZREER7r7S12CiEiwyiIIOjq72dLRVeoyRESCVBZBkEwYn/rOetp2HSp1KSIiwSmLIHh/Yw2z6ir5zKP/xis7D5a6HBGRoJRFEKSTCVavuIam+mo+93f/xsvbD5S6JBGRYJRFEAD81jlVPLniGubNquWuVRv4xZv7Sl2SiEgQyiYIAGbWVtL6hx/motl1/Ne/b+PZre+UuiQRkbNeWQUBQP20Cp74ww9zRdN0Pv8PG1m7eU+pSxIROauVXRAAnFOV5vG7PkzL+Q188clXebqto9QliYictcoyCABqK1M8dufVfOQDs/izpzfzD+vfLnVJIiJnpbINAoDqiiTf+WwL132wkft/+BqP/euvS12SiMhZp6yDAKAqneRvP9PC7182m4f++Q3+9sW3Sl2SiMhZpeyDAKAileBb/2UhN8+fw//46Zt887ntpS5JROSskSp1AeOVTib4xtIrqUgl+Pq6X9E3kOG/3fBBzKzUpYmITGlTJgggOifRX33yQ1SmEnzr+R30Dgxy/6JLFAYiInmYUkEAkEgY//22K6hMJfnOy7+mdyDDQ7dcRiKhMBARmYgpFwQAZsaDt1xKRSrBypd20jeQ4au3XUFSYSAicsamZBBAFAZ/8fGLqUwl+Otf7ODVt7tYcmUTixfMpam+utTliYhMGebuE7uh2bnA48BswIGV7v4NM5sBrAYuANqB292981Tbamlp8Q0bNkyoDoB/bOvgifW72Ph29OM2V18wg8UL5rLoijnMqKmY8HZFRMqZmbW5e0ve28kjCOYAc9x9o5nVAW3AEuBzwCF3/5qZ3Qc0uPuXTrWtfIMg6+2Dx1m7eTf/tGkPO/YfJZUw/uMHG1m8YC7/6dLZTKuYsgdAIiInKXkQnLQhszXAt+LLde6+Nw6LF9z9olPdtlBBkOXubNv7Hms27Wbt5j3sPdxDdTrJDZfN5tYFc/nohY2kk1PiKxQiImMqqyAwswuAl4DLgbfdvT5uN6Aze33EbVYAKwDOO++8q3bt2pV3HaPJZJxfth9izeY9/OS1vXQd76dhWpqb5s/h1gVNXHVegz5xJGe1wYxztHcAgHTSSCaMdCJR9P3e3RnIOAODTn8mw8CgM5DJkMkwNB10ZzCTYTAT1TmY8bjtxCUTbycTXzeDVDJBOmGkkglSyejxpJJGOmmk4vnsNLssO382/X8vmyAws1rgReCr7v6MmXXldvxm1unuDafaRqGPCMbSN5DhpV8dYM3mPax74x16+jM01VezeMFcFn9oLhe/r07fSRDcnd6BDO/1DNDdN4hZ9B2WZMJImA3NJxNG0oxEApJx+5nuP+4nOr9MBjKenXcyzlBH2NM/yHs9Axzp6ee9noH4Es0f7Y3mj4xoz06P9w2Oet8JI6fTjDvVhJFOJkgmTnScyUTUwSYSxmDG6R90BgYzUScfd/D9g7nz0bLBTGFGGwotYZAwwwyM+O9lYDDUZkPX4zVOrIaZkTCoTCWpTCeoiqeVqUTUlkrE1+P5VILKdJKqeJptq6lMMbO2kpk1FcysrWBGTQWVqeQZPZayCAIzSwM/An7m7l+P2/6dEg8Njcex3gF+/sY7rNm0h5e3v8tgxplRU8HC8xq46vzoMr95OlXpM/vDyOm5O8f7Bjnc3c/h7n66jkfTI939dHX3DbUPDHr8Ki8RX6JXeRWpEx1WOmd5KmlUJBPRq8W4vad/kKO9UWd5tGeAY70DvNcbTaP2QY729HOsN2e93oEJd2KJODQSFnWuiUTUkbgz9Eo3d74QqtIJ6qrS1FWlqKtKc05VitrK1ND1uvg6RMEykIk662ynPpjJDHXkJ3X02Q4+7tiTiehvkE7a0KvyZBwi2b9POn71nUyceNWejsMmGYdNFKAnnqPksICFZCIxFLKpRIJkgqEQdo+OKKI6c442BjP0Z+LaB30oqHIfT3+8LOOOx38XAMeJ/+Ee/Y2yy6M1c9aNw7pvIEPvQIbe/gy9A4PR/ECGnv7s/GC8LJrv6c+c9m9ZV5ViVk44zKytZFZNNJ1ZW8HMmkpmxe311WmSyURBgmDC757Gwz6PAtuyIRBbCywDvhZP1+RVYZHUVKa47cpmbruymYNHe1n3xj427Opk465O/mVb9DOZqYRx2dxzWHj+iXCYM/3s+WhqJuMcONpLR2c3HZ3H6ejsZndXN/sO9wAM+4862n/eVE6HlxxxyWR8qEM/3N1PV/eJzv5wdz/9g2N3ggmD6dVp0snE0H/e/pz/3Pkwg9qKFDWVKWqromldZYrGukpqK9PUViaHtVdXpHA/eXhi0Bka0siMGMrIvqLPdrrRY7ITIZGI5y07f+JoIzdIkvF6iYRRlUoO69hz5/V+19TgHoVsTxwQR3sHOHSsl3eP9nHwaB8Hj/Zy8Fgf7x7t5eDRPn797jE2tHfSebyP0Xb7Qn5vKp9PDV0LvAy8BmSj7n5gPfAUcB6wi+jjo4dOta1SHBGcyqFjfbz6didtu6LL5o6uoTSfO72Khec3DB05XDr3nLz+I/YPZqJXqT0DHOuLXolWxYePudOJ3Mdgxtn/Xk/Uwed09tkOf3dnN32Dw1+lzKip4H3nVJFIMPTKKdsBDpvG7YMjLrkddV1VivppaaZXR5f66grOyc4Pa08Pa6+tTI05xJK9/2ww9A1mold9A9Erw2Htg05lKkFt/Iq4tjJFdTp5Vo0Ry9lvMON0He8bFhIHj0YB8mc3Xlz6oaFCKbcgGKl/MMO2vUdo29XJxre72Lirk91d3UB0WD6/uZ6rzm/gg7Nr6enP6dh7o859aL53cGhYIjtE0Tdw+sNFiNJ/ZDhU5ow5VsVjkBWpBIeO9bG7q5s9Xd0nvfKeVVtBU8M0mhuqo0t9Nc3x9bn11dRU5v8R20z2VbA6XJGiKov3CAql3INgNHsPd7NxV1ccDp28vufwSZ3utIpkNAQRX2oqk0PDD9mhidxhitrKFAmznLHFaFwxO82OPY427RnI0Ns/SN9AhunT0kOde3NDNU1xZ99UX011hd7zEDlbFCoI9A2rCZozvZqb5ldz0/w5APT0D9LR2U1N3MnXVKR07iMRmRIUBAVSlU7ygd+qLXUZIiJnTB83EBEJnIJARCRwCgIRkcApCEREAqcgEBEJnIJARCRwCgIRkcApCEREAqcgEBEJnIJARCRwCgIRkcApCEREAqcgEBEJnIJARCRwCgIRkcApCEREAqcgEBEJnIJARCRwCgIRkcApCEREAqcgEBEJnIJARCRwCgIRkcApCEREAqcgEBEJnIJARCRwCgIRkcApCEREAqcgEBEJnIJARCRwCgIRkcApCEREAleUIDCzG83s381sh5ndV4z7EBGRwih4EJhZEvi/wMeBS4E7zOzSQt+PiIgURjGOCK4Gdrj7TnfvA54Ebi3C/YiISAGkirDNJuA3Odc7gA+PXMnMVgAr4qu9Zra1CLUU2izg3VIXMQ6qs3CmQo2gOgttqtR5USE2UowgGBd3XwmsBDCzDe7eUqpaxkt1FtZUqHMq1Aiqs9CmUp2F2E4xhoZ2A+fmXG+O20REpAwVIwh+CVxoZvPMrAJYCqwtwv2IiEgBFHxoyN0HzOwLwM+AJPA9d3/9NDdbWeg6ikR1FtZUqHMq1Aiqs9CCqtPcvRDbERGRKUrfLBYRCZyCQEQkcJMaBGbWbmavmdmm0T72ZJFvxqem2GJmCyezvriGi+L6spcjZvbFEetcZ2aHc9Z5YJJq+56Z7c/9zoWZzTCzdWa2PZ42jHHbZfE6281sWQnq/F9m9mb8d/2hmdWPcdtT7iNFrvEhM9ud83ddNMZtJ+0UKmPUuTqnxnYz2zTGbSfluYzv61wze97M3jCz183sT+L2sto/T1Fnue2fY9VZnH3U3SftArQDs06xfBHwU8CAa4D1k1nfKPUkgXeA80e0Xwf8qAT1fAxYCGzNafufwH3x/H3Aw6PcbgawM542xPMNk1znDUAqnn94tDrHs48UucaHgHvHsU+8BbwfqAA2A5dOZp0jlv9v4IFSPpfxfc0BFsbzdcCviE4xU1b75ynqLLf9c6w6i7KPltvQ0K3A4x55Bag3szklrOd64C1331XCGoa4+0vAoRHNtwKr4vlVwJJRbvr7wDp3P+TuncA64MbJrNPdf+7uA/HVV4i+X1IyYzyX4zGpp1A5VZ1mZsDtQGux7n+83H2vu2+M598DthGdZaCs9s+x6izD/XOs53M8zngfnewgcODnZtZm0SkmRhrt9BTjffDFsJSx/5P9jpltNrOfmtllk1nUCLPdfW88/w4we5R1yu15XU505Dea0+0jxfaFeHjge2MMY5TTc/lRYJ+7bx9jeUmeSzO7ALgSWE8Z758j6sxVVvvnKHUWfB+d7CC41t0XEp2Z9PNm9rFJvv9xs+jLcIuBH4yyeCPRcNGHgL8G/mkyaxuLR8eFZf15YDP7MjAAPDHGKqXcR74N/DawANhLNOxSzu7g1EcDk/5cmlkt8I/AF939SO6ycto/x6qz3PbPUeosyj46qUHg7rvj6X7gh0SHMLnK6fQUHwc2uvu+kQvc/Yi7H43nfwKkzWzWZBcY25cdPoun+0dZpyyeVzP7HHAz8Km4UzjJOPaRonH3fe4+6O4Z4Dtj3He5PJcp4A+A1WOtM9nPpZmliTqtJ9z9mbi57PbPMeosu/1ztDqLtY9OWhCYWY2Z1WXnid6cGXnG0bXAZy1yDXA457Byso35asvM3hePz2JmVxM9jwcnsbZca4HspyyWAWtGWednwA1m1hAfSt4Qt00aM7sR+HNgsbsfH2Od8ewjxawx9/2o28a473I5hcrvAW+6e8doCyf7uYz/PzwKbHP3r+csKqv9c6w6y23/PEWdxdlHi/3ud8472e8nevd6M/A68OW4/Y+AP4rnjehHbd4CXgNaJqu+EbXWEHXs03Pacuv8QvwYNhO9sfQfJqmuVqLDwX6icb+7gJnAc8B24F+AGfG6LcB3c267HNgRX+4sQZ07iMYtN8WXv4nXnQv85FT7yCTW+Pfxfrcl/o8zZ2SN8fVFRJ/ieKuYNY5VZ9z+WHZ/zFm3JM9lfH/XEg37bMn5Gy8qt/3zFHWW2/45Vp1F2Ud1igkRkcCV28dHRURkkikIREQCpyAQEQmcgkBEJHAKAhGRwCkIJCjx2RvvPcXyx8zsk/H8C2ZW9j9gLpIvBYFIgZhZstQ1iEyEgkDOemb2ZTP7lZn9P+CiuG2Bmb2Sc/75Uc+Tn7ONb5vZhvjc8H+Z095uZg+b2UbgP5vZ3fE55LeY2ZPFfWQihVHwH68XKSdmdhXRV+wXEO3vG4E24HHgj939RTP7CvAg8MUxNxR9O/NQ/Kr/OTOb7+5b4mUHPToRGWa2B5jn7r02xo+biJQbHRHI2e6jwA/d/bhHZ29cS3QKkXp3fzFeZxXRD8Ccyu3xq/5XgcuIfiQkK/fEb1uAJ8zs00RnsRQpewoCkdMws3nAvcD17j4f+DFQlbPKsZz5m4jOl7UQ+GV8llCRsqYgkLPdS8ASM6uOzxx5C1HH3WlmH43X+Qzw4lgbAM6Jb3PYzGYTnaL8JGaWAM519+eBLwHTgdrCPAyR4tGrFTmruftGM1tNdMbI/USn6IXolMh/Y2bTiH4j985TbGOzmb0KvEl0hsp/HWPVJPB9M5tOdCbdb7p7V2EeiUjx6OyjIiKB09CQiEjgFAQiIoFTEIiIBE5BICISOAWBiEjgFAQiIoFTEIiIBO7/A/hM3dkfezZjAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "eval_query = create_faceted_eval_query( create_input_dataset_fc('EVAL'), 'demos.taxifare_model_fc10')\n",
    "eval_df = bq.query(eval_query, project=PROJECT).to_dataframe()\n",
    "ax = eval_df.plot(x='dollars', y='MAPE');\n",
    "ax.set_xlim(5, 25)\n",
    "ax.set_ylim(0,100)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Geo visualization\n",
    "\n",
    "Instead of grouping by the total amount, we can group by a spatial thing. Let's look at how the taxifare error varies depending on the dropoff point, by running the following query in the BigQuery Geo Viz:\n",
    "<pre>\n",
    "WITH predictions AS (\n",
    "  SELECT\n",
    "    ABS(total_fare - predicted_total_fare)/total_fare AS error,\n",
    "    total_fare, pickup_gridpt, dropoff_gridpt\n",
    "  FROM\n",
    "  ML.PREDICT(MODEL demos.taxifare_model_fc,\n",
    "  (\n",
    "  \n",
    "WITH params AS (\n",
    "  SELECT\n",
    "  0.1 AS RES,\n",
    "  1 AS TRAIN,\n",
    "  2 AS EVAL\n",
    "  ),\n",
    "daynames AS\n",
    "  (SELECT ['Sun', 'Mon', 'Tues', 'Wed', 'Thurs', 'Fri', 'Sat'] AS daysofweek),\n",
    "  \n",
    "taxitrips AS (\n",
    "SELECT\n",
    "  (tolls_amount + fare_amount) AS total_fare,\n",
    "  daysofweek[ORDINAL(EXTRACT(DAYOFWEEK FROM pickup_datetime))] AS dayofweek,\n",
    "  EXTRACT(HOUR FROM pickup_datetime) AS hourofday,\n",
    "  ST_GeogPoint(pickup_longitude, pickup_latitude) AS pickup,\n",
    "  ST_GeogPoint(dropoff_longitude, dropoff_latitude) AS dropoff,\n",
    "  passenger_count AS passengers\n",
    "FROM\n",
    "  `nyc-tlc.yellow.trips`, daynames, params\n",
    "WHERE\n",
    "  trip_distance > 0 AND fare_amount > 0 AND fare_amount < 100\n",
    "  and fare_amount >= 2.5 and pickup_longitude > -78 and pickup_longitude < -70 \n",
    "      and dropoff_longitude > -78 and dropoff_longitude < -70 and pickup_latitude > 37 \n",
    "      and pickup_latitude < 45 and dropoff_latitude > 37 and dropoff_latitude < 45 \n",
    "      and passenger_count > 0\n",
    "  AND ABS(MOD(FARM_FINGERPRINT(CAST(pickup_datetime AS STRING)), 1000)) = params.EVAL\n",
    "),\n",
    "feateng AS (\n",
    "  SELECT \n",
    "    total_fare,\n",
    "    ST_Distance(pickup, dropoff) AS euclidean,\n",
    "    CONCAT(dayofweek, CAST(hourofday AS STRING)) AS dayhr_fc,\n",
    "    CONCAT(ST_AsText(ST_SnapToGrid(pickup, params.RES)),\n",
    "           ST_AsText(ST_SnapToGrid(dropoff, params.RES))) AS loc_fc,\n",
    "    ST_AsText(ST_SnapToGrid(pickup, params.RES)) AS pickup_gridpt,\n",
    "    ST_AsText(ST_SnapToGrid(dropoff, params.RES)) AS dropoff_gridpt\n",
    "  FROM\n",
    "    taxitrips, params\n",
    ")\n",
    "  \n",
    "  SELECT *\n",
    "  FROM feateng\n",
    "  \n",
    "  )))\n",
    "SELECT\n",
    "  dropoff_gridpt,\n",
    "  ST_GeogFromText(dropoff_gridpt) AS geom,\n",
    "  COUNT(error) AS numpts,\n",
    "  -- mean absolute percent error\n",
    "  AVG(100 * error) AS MAPE\n",
    "FROM predictions\n",
    "GROUP BY dropoff_gridpt\n",
    "HAVING numpts > 100\n",
    "</pre>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Copyright 2018 Google Inc. Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install google-cloud-bigquery seaborn"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
